Dante - Di Michelino 150° sponsors







Corporate & Society Sponsors
Loquendo diamond package
Nuance gold package
ATT bronze package
Google silver package
Appen bronze package
Appen bronze package
Interactive Media bronze package
Microasoft bronze package
SpeechOcean bronze package
Avios logo package
NDI logo package
NDI logo package

CNR-ISTC

CNR-ISTC
Universitč de Avignon
Speech Cycle
AT&T
Universitā di Firenze
FUB
FBK
Univ. Trento
Univ. Napoli
Univ. Tuscia
Univ. Calabria
Univ. Venezia

AISV
AISV

AISV
AISV
Comune di Firenze
Firenze Fiera
Florence Convention Bureau

ISCA

12thAnnual Conference of the
International Speech Communication Association

Sponsors
sponsors

Interspeech 2011 Florence

Special Sessions

SS-4

Speech and audio processing for human-robot interaction

Tue-Ses3-S1-O - oral
Tue-Ses3-S1-P - poster

Introduction
The field of human-robot interaction is attracting an increasing amount of interest from researchers making this an ideal time to highlight the work being done in human-robot spoken language interaction.

Social interaction is characterized by a continuous and dynamic exchange of informationcarrying signals. Producing and understanding these signals allow humans to communicate simultaneously on multiple levels. Such signals include: speech and non-speech sounds, gesture, facial expression and pose. Among these channels, vocal expression is best suited for communicating a rich variety of information; it is also the most natural modality for communicating meaning, emotion and personality. Vocal expression is characterized by a verbal component (language) and by a non-verbal component (prosody, intonation, hesitation).

Our current ability to model vocal communication is quite limited; spoken language systems, robots in our case, are able to communicate concrete meaning through language but their ability to detect (or for that matter generate) non-linguistic information streams is quite primitive. The ability to understand this information, and for that matter adapt generation to the goal of the communication and the characteristic of particular interlocutors, constitutes a significant aspect of natural interaction.

The purpose of this special session is to bring together researchers who are exploring vocal expression from different perspectives, including detection, modeling and generation. The focus of the session is on audio verbal and non-verbal cues required for the design of natural interaction between a human and a robot.

Special session topics may include, but are not limited to:

  • Speech recognition systems for HRI
  • Dialog systems for HRI
  • Automatic emotion detection from verbal and non-verbal cues
  • Automatic recognition of user personality in dialog
  • Multimodal speech/audio expression generation in robots
  • Perception-action loops in robots
  • Back-channel generation and understanding
  • Interpretation of prosodic information
  • Timing in discourse
  • Integrated models of vocal communication
  • Natural Human-Robot Interaction (HRI)

Organizers:
Laurence Devillers
, LIMSI-CNRS (devil@limsi.fr)
Agnes Delaborde
, LIMSI-CNRS (agnes.delaborde@limsi.fr)
Alexander Rudnicky
, Carnegie Mellon Univ. (Alex.Rudnicky@cs.cmu.edu)