The field of human-robot interaction is attracting an increasing amount of interest from researchers making this an ideal time to highlight the work being done in human-robot spoken language interaction.
Social interaction is characterized by a continuous and dynamic exchange of informationcarrying signals. Producing and understanding these signals allow humans to communicate simultaneously on multiple levels. Such signals include: speech and non-speech sounds, gesture, facial expression and pose. Among these channels, vocal expression is best suited for communicating a rich variety of information; it is also the most natural modality for communicating meaning, emotion and personality. Vocal expression is characterized by a verbal component (language) and by a non-verbal component (prosody, intonation, hesitation).
Our current ability to model vocal communication is quite limited; spoken language systems, robots in our case, are able to communicate concrete meaning through language but their ability to detect (or for that matter generate) non-linguistic information streams is quite primitive. The ability to understand this information, and for that matter adapt generation to the goal of the communication and the characteristic of particular interlocutors, constitutes a significant aspect of natural interaction.
The purpose of this special session is to bring together researchers who are exploring vocal expression from different perspectives, including detection, modeling and generation. The focus of the session is on audio verbal and non-verbal cues required for the design of natural interaction between a human and a robot.
Special session topics may include, but are not limited to: