Dante - Di Michelino 150° sponsors







Corporate & Society Sponsors
Loquendo diamond package
Nuance gold package
ATT bronze package
Google silver package
Appen bronze package
Appen bronze package
Interactive Media bronze package
Microasoft bronze package
SpeechOcean bronze package
Avios logo package
NDI logo package
NDI logo package

CNR-ISTC

CNR-ISTC
Universit柤e Avignon
Speech Cycle
AT&T
Universit�i Firenze
FUB
FBK
Univ. Trento
Univ. Napoli
Univ. Tuscia
Univ. Calabria
Univ. Venezia

AISV
AISV

AISV
AISV
Comune di Firenze
Firenze Fiera
Florence Convention Bureau

ISCA

12thAnnual Conference of the
International Speech Communication Association

Sponsors
sponsors

Interspeech 2011 Florence

Technical Programme

This is the final programme for this session. For oral sessions, the timing on the left is the current presentation order, but this may still change, so please check at the conference itself. If you have signed in to My Schedule, you can add papers to your own personalised list.

Mon-Ses2-S1:
Show & Tell Demonstration - Speech Systems and Applications

Time:Monday 13:30 Place:Donatello (Room Onice) - Pala Congressi - Ground Floor Type:Poster
Chair:Dimitrios Dimitriadis

#1An Affective Spoken Storyteller

Felix Burkhardt (Deutsche Telekom Laboratories)

We present a software to read texts with emotional expression. The software is developed as part of the Emofilt open source emotional speech synthesis software. The affective storyteller consists of a text editor which offers a set of emotional speaking styles that can be used to mark up the text. The system was validated in a perception experiment and, although the number of participants wasn’t very large, could show the general usability of the approach.

#2Text Driven 3D Photo-Realistic Talking Head

Lijuan Wang (Microsoft Research Asia)
Frank Soong (Microsoft Research Asia)
Wei Han (Department of Computer Science, Shanghai Jiao Tong University, China)
Qiang Huo (Microsoft Research Asia)

We propose a new 3D photo-realistic talking head with a personalized, photo realistic appearance. It extends our prior, high-quality, 2D photo-realistic talking head to 3D. We use a 2D-to-3D reconstruction algorithm to automatically adapt a general 3D head mesh model to the individual. In training, super feature vectors consisting of 3D geometry, texture and speech are formed to train a statistical, multi-streamed, Hidden Markov Model (HMM). The HMM is then used to synthesize both the trajectories of geometry animation and dynamic texture. The 3D talking head animation can be controlled by the rendered geometric trajectory while the facial expressions and articulator movements are rendered with the dynamic 2D image sequences. Head motions and facial expression can also be separately controlled by manipulating corresponding parameters. The new 3D talking head has many useful applications such as voice-agent, tele-presence, gaming, social networking, etc.

#3Physical Models Producing Vowels with Pitch Variation

Arai Takayuki (Sophia University)

Physical models of the human vocal tract are useful for education in acoustics and speech science. To excite such vocal-tract models, different types of sound sources may be used. We have developed two new types of physical models which produce a glottal source with a variable fundamental frequency. Both types are based on a reed vibration, and the length of the vibratory portion can be varied manually. In the first type, the reed itself is curved, while the reed of the second type is straight but its support is curved. In each case, we can demonstrate vowel production with pitch variation by combining vocal-tract models with our proposed source models.

#4An Engine-Independent Text-to-Speech Workplace

Margot Mieskes (European Media Laboratory GmbH)

We present a web-based graphical user interface for access to Text-to-Speech engines. The workplace is intended to be engine-independent, allowing the user to not worry about the interaction with the specific engine, but to focus on his/her task and create a good synthesis result. Additionally, the workplace offers support for non-expert users in specific tuning and interaction tasks, such as phonetic transcriptions or creating a lexicon for usage during synthesis. We also present two application scenarios which were the basis for creating this workplace and the current status of the workplace.

#5An application to test the emotion conveyed by vocal and musical signals.

Simone Carcone (ISIM_garage Phys. Dept., University of Rome Tor Vergata, Italy)
Carlo Giovannella (ISIM_garage Dip. Fisica e Scuola IaD - University of Rome Tor Vergata)

We present an application that allows to built up straightforwardly tests to measure the emotion conveyed by multimodal and single modal signals, among them voice, music and sounds. The application is available either as a stand-alone application and, partially, as web-service.

#6Automatic Speech Recognition System Dedicated for Polish

Mariusz Ziółko, (Department of Electronics, AGH University of Science and Technology)
Jakub Gałka (Department of Electronics, AGH University of Science and Technology)
Bartosz Ziółko (Department of Electronics, AGH University of Science and Technology)
Tomasz Jadczyk (Department of Electronics, AGH University of Science and Technology)
Skurzok Dawid (Department of Electronics, AGH University of Science and Technology)
Mąsior Mariusz (Department of Electronics, AGH University of Science and Technology)

An automatic speech recognition system for Polish is demonstrated. A few layers of our system are different from popular approaches as a result of differences between Polish and English languages.

#7Joint Application of Speech and Speaker Recognition for Automation and Security in Smart Home

Kong Aik Lee (Institute for Infocomm Research, Singapore)
Anthony Larcher (Institute for Infocomm Research, Singapore)
Helen Thai (Institute for Infocomm Research, Singapore)
Bin Ma (Institute for Infocomm Research, Singapore)
Haizhou Li (Institute for Infocomm Research, Singapore)

This paper describes the deployment of speech technologies in STARHome, a fully functional smart home prototype. We make use of speech and speaker recognition technologies to provide three voice services, namely, voice command for controlling home appliances, voice biometric for entrance-door access control, and service customization (speaker-loaded command control). Voice applications for STARHome have been designed to deal with short utterances and low SNR.

#8Adding a Speech Cursor to a Multimodal Dialogue System

Staffan Larsson (University of Gothenburg)
Alexander Berman (Talkamatic AB)
Jessica Villing (University of Gothenburg)

This paper describes an in-vehicle dialogue system demonstrating a novel combination of flexible multimodal menu-based dialogue and a "speech cursor" which enables menu navigation as well as browsing long list using haptic input and spoken output.

#9Prosody Toolkit: Integrating HTK, Praat and WEKA

Scott Thomas Christie (Cognitive Science, University of Minnesota)
Serguei Pakhomov (Center for Clinical and Cognitive Neuropharmacology, College of Pharmacy, University of Minnesota)

A major hurdle in computational speech analysis is the effective integration of available tools originally developed for purposes unrelated to each other. We present a Python-based tool to enable an efficient and organized processing workflow incorporating automatic speech recognition using HTK, phoneme-level prosodic feature extraction in Praat and machine learning in WEKA. Our system is extensible, customizable and organizes prosodic data by phoneme and time stamp in a tabular fashion in preparation for analysis using other utilities. Plotting of prosodic information is supported to enable visualization of prosodic features.

#10Collecting life logs for experience-based corpora

Fabiano Francesconi (DISI - University of Trento, 38050 Povo (Trento), Italy)
Arindam Ghosh (DISI - University of Trento, 38050 Povo (Trento), Italy)
Giuseppe Riccardi (DISI - University of Trento, 38050 Povo (Trento), Italy)
Marco Ronchetti (DISI - University of Trento, 38050 Povo (Trento), Italy)
Alex Vagin (DISI - University of Trento, 38050 Povo (Trento), Italy)

In this paper we propose an approach to lightweight acquisition, sharing and annotation of experience-based corpora via mobile devices. Corpora acquisition is the crucial and often costly process in speech and language science and engineering. To address this problem, we have built a system for creating a location based corpora annotated with multimedia tags (e.g. text, speech, image) generated by end-users. We describe a relevant case study for the collection of mobile user life logs. We plan to make publicly available such tools and platforms to the research community for collaborative development and distributed experiential corpora collection.