Dante - Di Michelino 150° sponsors







Corporate & Society Sponsors
Loquendo diamond package
Nuance gold package
ATT bronze package
Google silver package
Appen bronze package
Appen bronze package
Interactive Media bronze package
Microasoft bronze package
SpeechOcean bronze package
Avios logo package
NDI logo package
NDI logo package

CNR-ISTC

CNR-ISTC
Universit柤e Avignon
Speech Cycle
AT&T
Universit�i Firenze
FUB
FBK
Univ. Trento
Univ. Napoli
Univ. Tuscia
Univ. Calabria
Univ. Venezia

AISV
AISV

AISV
AISV
Comune di Firenze
Firenze Fiera
Florence Convention Bureau

ISCA

12thAnnual Conference of the
International Speech Communication Association

Sponsors
sponsors

Interspeech 2011 Florence

Interspeech 2011 Technical Programme

Sun-Ses2-O1:
Speaker Recognition - Modeling

Time: Sunday 13:30
Place: Auditorium - Pala Congressi
Type: Oral
Chair: Andrea Paoloni

13:30 Skew Gaussian mixture models for speaker recognition
Avi Matza, Yuval Bistritz
13:50 Towards Goat Detection in Text-Dependent Speaker Verification
Orith Toledo-Ronen, Hagai Aronowitz, Ron Hoory, Jason Pelecanos, David Nahamoo
14:10 Speaker modeling using local binary decisions
Jean-Francois Bonastre, Xavier Anguera, Gabirel H. Sierra, Pierre-Michel Bousquet
14:30 New Developments in Voice Biometrics for User Authentication
Hagai Aronowitz, Ron Hoory, Jason Pelecanos, David Nahamoo
14:50 Evaluation of i-vector Speaker Recognition Systems for Forensic Application
Miranti Indar Mandasari, Mitchell McLaren, David van Leeuwen
15:10 Mixture of PLDA Models in I-Vector Space for Gender-Independent Speaker Recognition
Mohammed Senoussaoui, Patrick Kenny, Niko Brümmer, Edward De Villiers, Pierre Dumouchel

Sun-Ses2-O3:
Speech Representation and Modelling

Time: Sunday 13:30
Place: Brunelleschi (Green Room) - Pala Congressi - 2nd Floor
Type: Oral
Chair: Yannis Stylianou

13:30 A Long-Term Harmonic plus Noise Model for Speech Signals
Faten Ben Ali, Laurent Girin, Sonia Djaziri Larbi
13:50 A Frequency Domain Approach to ARX-LF Voiced Speech Parameterization and Synthesis
Alan O Cinneide, David Dorran, Gainza Mikel, Eugene Coyle
14:10 Automatic Data-Driven Learning of Articulatory Primitives from Real-Time MRI Data using Convolutive NMF with Sparseness Constraints
Vikram Ramanarayanan, Athanasios Katsamanis, Shrikanth Narayanan
14:30 Online Pattern Learning for Non-Negative Convolutive Sparse Coding
Dong Wang, Ravichander Vipperla, Nicholas Evans
14:50 Sinewave Representations of Nonmodality
Nicolas Malyska, Thomas F. Quatieri, Robert Dunn
15:10 Time-Varying Signal Adaptive transform and IHT recovery of compressive sensed speech
Srikanth Raj Ch, Sreenivas T. V.

Sun-Ses2-O2:
Speech Perception - Speech Intelligibility

Time: Sunday 13:30
Place: Leonardo - Pala Affari - Ground Floor
Type: Oral
Chair: Anne Cutler

13:30 Segregation of whispered speech interleaved with noise or speech maskers
Nandini Iyer, Douglas, S. Brungart, Brian D. Simpson
13:50 Monaural Azimuth Localization Using Spectral Dynamics of Speech
Roi Kliper, Hendrik Kayser, Daphna Weinshall, Israel Nelken, Jörn Anemüller
14:10 Prediction of binaural intelligiblity level differences in reverberation
Jan Rennies, Thomas Brand, Birger Kollmeier
14:30 Let’s all speak together! Exploring the impact of various languages on the comprehension of speech in multi-linguistic babble.
Aurore Gautreau, Michel Hoen, Fanny Meunier
14:50 Cross-Rate Variation in the Intelligibility of Dual-Rate Gated Speech in Older Listeners
Valeriy Shafiro, Stanley Sheft, Robert Risley
15:10 An Efferent-Inspired Auditory Model Front-End for Speech Recognition
Chia-ying Lee, James Glass, Oded Ghitza

Sun-Ses2-O4:
Emotion, Speaking Style, and Social Behavior

Time: Sunday 13:30
Place: Michelangelo - Pala Affari - 2nd Floor
Type: Oral
Chair: Anton Batliner

13:30 Acoustic-Linguistic Recognition of Interest in Speech with Bottleneck-BLSTM Nets
Martin Woellmer, Felix Weninger, Florian Eyben, Bjoern Schuller
13:50 Automatic Detection of Anger in Human-Human Call Center Dialogs
Mustafa Erden, Levent M. Arslan
14:10 Improved Classification of Speaking Styles for Mental Health Monitoring using Phoneme Dynamics
Keng-hao Chang, Howard Lei, John Canny
14:30 \"You made me do it\": Classification of Blame in Married Couples\' Interactions by Fusing Automatically Derived Speech and Language Information
Matthew P. Black, Panayiotis G. Georgiou, Athanasios Katsamanis, Brian R. Baucom, Shrikanth S. Narayanan
14:50 Context and priming effects in the recognition of emotion in old and young listeners
Martijn Goudbeek, Marie Nilsenová
15:10 Acoustic and Prosodic Correlates of Social Behavior
Agustin Gravano, Rivka Levitan, Laura Willson, Stefan Benus, Julia Hirschberg, Ani Nenkova

Sun-Ses2-O5:
HMM-based Speech Synthesis I

Time: Sunday 13:30
Place: Raffaello - Pala Affari - 3rd Floor
Type: Oral
Chair: Keiichi Tokuda

13:30 Decision Tree-based Clustering with Outlier Detection for HMM-based Speech Synthesis
Kyung Hwan Oh, June Sig Sung, Doo Hwa Hong, Nam Soo Kim
13:50 Prediction of voice aperiodicity based on spectral representations in HMM speech synthesis
Hanna Silén, Elina Helander, Moncef Gabbouj
14:10 A Perceptual Expressivity Modeling Technique for Speech Synthesis Based on Multiple-Regression HSMM
Takashi Nose, Takao Kobayashi
14:30 Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis
Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda
14:50 Feature-Space Transform Tying in Unified Acoustic-Articulatory Modelling for Articulatory Control of HMM-based Speech Synthesis
Zhen-Hua Ling, Korin Richmond, Junichi Yamagishi
15:10 The Effect of Using Normalized Models in Statistical Speech Synthesis
Matt Shannon, Heiga Zen, William Byrne

Sun-Ses2-S1-O:
Speech and Language Processing-Based Assistive Technologies and Health Applications

Time: Sunday 13:30
Place: Caravaggio (Adua 1) - Pala Affari - 1st Floor
Type: Oral
Chair: Tobias Bocklet Chairs: Tobias Bocklet, Gokhan Tur

13:30 Automatic Detection of Depression in Speech using Gaussian Mixture Modeling with Factor Analysis
Douglas Sturim, Pedro Torres-Carrasquillo,, Thomas Quatieri, Nicolas Malyska, Alan McCree
13:50 Utterance Verification for automating the Hearing In Noise Test (HINT)
H. Timothy Bunnell, Jason Lilley, Sigfrid Soli, Ivan Pal
14:10 Analyzing the Nature of ECA Interactions in Children with Autism
Emily Mower, Chi-Chun Lee, James Gibson, Theodora Chaspari, Marian Williams, Shrikanth Narayanan

Sun-Ses2-P1:
Second Language Acquisition, Development and Learning I

Time: Sunday 13:30
Place: Valfonda 1 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster

#1 Acquisition of Timing Patterns in Second Language
Mikhail Ordin, Leona Polyanskaya
#2 Context-dependent Duration Modeling with Backoff Strategy and Look-up Tables for Pronunciation Assessment and Mispronunciation Detection
Hongyan Li, Shen Huang, Shijin Wang, Bo Xu
#3 Perceptual training of vowel length contrast of Japanese by L2 listeners: Effects of an isolated word versus a word embedded in sentences
Mee Sonu, Keiichi Tajime, Hiroaki Kato, Yoshinori Sagisaka
#4 Similar Vowels in L1/L2 Production: Confused or Discerned in Early L2 English Learners with Different amount of Exposure
E-Chin Wu
#5 Production and perception of Estonian vowels by native and non-native speakers
Lya Meister, Einar Meister
#6 New feature parameters for pronunciation evaluation in English presentations at international conferences
Hiroshi Kibishi, Seiichi Nakagawa
#7 Synchronous reading: learning French orthography by audiovisual training
Gérard Bailly, William-Seamus Barbour
#8 Phoneme Level Non-Native Pronunciation Analysis by an Auditory Model-based Native Assessment Scheme
Christos Koniaris, Olov Engwall
#9 The open front vowel /æ/ in the production and perception of Czech students of English
Pavel Šturm, Radek Skarnitzl
#10 Error selection for ASR-based English pronunciation training in \'My Pronunciation Coach\'
Catia Cucchiarini, Henk van den Heuvel, Eric Sanders, Helmer Strik
#11 An Experimental Analysis of Pitch Patterns in Japanese Speakers of English with Verification by Speech Re-synthesis
Tomoko Nariai, Kazuyo Tanaka
#12 An Analysis of Word Duration in Native Speakers and Japanese Speakers of English
Tomoko Nariai, Kazuyo Tanaka

Sun-Ses2-P2:
Speech Enhancement

Time: Sunday 13:30
Place: Valfonda 2 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Dietrich Klakow

#1 Evaluating artificial bandwidth extension by conversational tests in car using mobile devices with integrated hands-free functionality
Laura Laaksonen, Ville Myllylä, Riitta Niemistö
#2 Low-Frequency Bandwidth Extension of Telephone Speech Using Sinusoidal Synthesis and Gaussian Mixture Model
Hannu Pulakka, Ulpu Remes, Santeri Yrttiaho, Kalle Palomäki, Mikko Kurimo, Paavo Alku
#3 Memory-Based Approximation of the Gaussian Mixture Model Framework for Bandwidth Extension of Narrowband Speech
Amr Nour-Eldin, Peter Kabal
#4 Speech enhancement by reconstruction from cleaned acoustic features
Philip Harding, Ben Milner
#5 A Soft Decision-based Speech Enhancement using Acoustic Noise Classification
Jae-Hun Choi, Sang-Kyun Kim, Joon-Hyuk Chang
#6 A Noise Estimation Method Based on Speech Presence Probability and Spectral Sparseness
Chao Li, Wenju Liu
#7 Improved a posteriori Speech Presence Probability Estimation Based on Cepstro-Temporal Smoothing and Time-Frequency Correlation
Chao Li, Wenju Liu
#8 A Rapid Adaptation Algorithm for Tracking Highly Non-Stationary Noises Based on Bayesian Inference for On-Line Spectral Change Point Detection
Md Foezur Rahman Chowdhury Chowdhury, Sid-Ahmed Selouani, Douglas O\'Shaughnessy
#9 Single channel speech enhancement using MMSE estimation of short-time modulation magnitude spectrum
Kuldip Paliwal, Belinda Schwerin, Kamil Wojcicki
#10 Speech Enhancement Using Masking Properties in Adverse Environments
Atanu Saha, Tetsuya Shimamura
#11 Phoneme-dependent NMF for speech enhancement in monaural mixtures
Bhiksha Raj, Rita Singh, Tuomas Virtanen
#12 Kernel PCA for Speech Enhancement
Christina Leitner, Franz Pernkopf, Gernot Kubin
#13 Objective Intelligibility Prediction of Speech by Combining Correlation and Distortion based Techniques
Angel Gomez, Belinda Schwerin, Kuldip Paliwal

Sun-Ses2-P3:
ASR - Feature Extraction I

Time: Sunday 13:30
Place: Faenza 1 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Fabio Brugnara

#1 Integrating recent MLP feature extraction techniques into TRAP architecture
Frantisek Grezl, Martin Karafiat
#2 Feature Frame Stacking in RNN-based Tandem ASR Systems - Learned vs. Predefined Context
Martin Woellmer, Bjoern Schuller, Gerhard Rigoll
#3 Improved Acoustic Feature Combination for LVCSR by Neural Networks
Christian Plahl, Ralf Schlüter, Hermann Ney
#4 Hierarchical Tandem Features for ASR in Mandarin
Joel Pinto, Mathew Magimai.-Doss, Herve Bourlard
#5 Analysis and Comparison of Recent MLP Features for LVCSR Systems
Fabio Valente, Mathew Magimai Doss, Wen Wang
#6 Deep Learning of Speech Features for Improved Phonetic Recognition
Jaehyung Lee, Soo-Young Lee
#7 Globality-Locality Consistent Discriminant Analysis for Phone Classification
Heyun Huang, Yang Liu, Jort Gemmeke, Louis ten Bosch, Bert Cranen, Lou Boves
#8 Front-End Compensation Methods for LVCSR Under Lombard Effect
Hynek Boril, Frantisek Grezl, John H.L. Hansen
#9 Classification of Fricatives Using Feature Extrapolation of Acoustic-Phonetic Features in Telephone Speech
Jung-Won Lee, Jeung-Yoon Choi, Hong-Goo Kang
#10 Noise Robust Feature Extraction Based on Extended Weighted Linear Prediction in LVCSR
Sami Keronen, Jouni Pohjalainen, Paavo Alku, Mikko Kurimo
#11 Comparing Different Flavors of Spectro-Temporal Features for ASR
Bernd T. Meyer, Suman V. Ravuri, Marc René Schädler, Nelson Morgan
#12 VTLN in the MFCC domain: band-limited versus local interpolation
Ehsan Variani, Thomas Schaaf
#13 Multistream Bandpass Modulation Features for Robust Speech Recognition
Sridhar Krishna Nemala, Kailash Patil, Mounya Elhilali
#14 An Analysis of Automatic Speech Recognition with Multiple Microphones
Davide Marino, Thomas Hain

Sun-Ses2-P4:
Spoken Dialogue & Spoken Language Understanding Systems

Time: Sunday 13:30
Place: Faenza 2 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Steve Renals

#1 Multi-view approach for speaker turn role labeling in TV Broadcast News shows
Geraldine Damnati, Delphine Charlet
#2 Evaluation of an Integrated Authoring Tool for Building Advanced Question-Answering Characters
Sudeep Gandhe, Michael Rushforth, Priti Aggarwal, David Traum
#3 Towards Unsupervised Spoken Language Understanding: Exploiting Query Click Logs for Slot Filling
Gokhan Tur, Dilek Hakkani-Tür, Dustin Hillard, Asli Celikyilmaz
#4 Web-enhanced Contents Retrieval for Information Access Dialogue System
Donghyeon Lee, Cheongjae Lee, Minwoo Jeong, Kyungduk Kim, Seokhwan Kim, Junhwi Choi, Gary Geunbae Lee
#5 Uncertainty management for on-line optimisation of a POMDP-based large-scale spoken dialogue system
Lucie Daubigney, Milica Gasic, Senthilkumar Chandramohan, Matthieu Geist, Olivier Pietquin, Steve Young
#6 Detection of task-incomplete dialogs based on utterance-and-behavior tag N-gram for spoken dialog systems
Sunao Hara, Norihide Kitaoka, Kazuya Takeda
#7 Shrinkage Based Features for Natural Language Call-Routing
Ruhi Sarikaya, Stanley F. Chen, Bhuvana Ramabhadran
#8 Clustering with modified cosine distance learned from constraints
Leonid Rachevsky, Dimitri Kanevsky, Ruhi Sarikaya, Bhuvana Ramabhadran
#9 Using Speaker ID to Discover Repeat Callers to a Spoken Dialog System
Andrew Fandrianto, Brian Langner, Alan W Black
#10 Semantic graph clustering for POMDP-based spoken dialog systems
Florian Pinault, Fabrice Lefèvre
#11 Learning Place-Names from Spoken Utterances and Localization Results by Mobile Robot
Ryo Taguchi, Yuji Yamada, Koosuke Hattori, Taizo Umezaki, Masahiro Hoguro, Naoto Iwahashi, Kotaro Funakoshi, Mikio Nakano
#12 Active Learning for Dialogue Act Classification
Björn Gambäck, Fredrik Olsson, Oscar Täckström
#13 Speaker Role Recognition using question detection and characterization
Thierry Bazillon, Benjamin Maza, Mickael Rouvier, Frederic Bechet, Alexis Nasr
#14 Learning Score Structure from Spoken Language for A Tennis Game
Qiang Huang, Stephen Cox
#15 Semi-automated classifier adaptation for natural language call routing
Silke M. Witt
#16 Interactional Style Detection for Versatile Dialogue Response Using Prosodic and Semantic Features
Wei-Bin Liang, Chung-Hsien Wu, Chih-Hung Wang, Jhing-Fa Wang
#17 Quality aspects of multimodal dialog systems: identity, stimulation and success
Christine Kuehnel, Benjamin Weiss, Matthias Schulz, Sebastian Moeller

Sun-Ses2-S1-P:
Speech and Language Processing-Based Assistive Technologies and Health Applications

Time: Sunday 14:30
Place: Caravaggio (Adua 1) - Pala Affari - 1st Floor
Type: Poster
Chair: Shri Narayanan Chairs: Shri Narayanan, Elmar Noeth

#1 Incorporating Speech Recognition Engine Into an Intelligent Assistive Reading System for Dyslexic Students
Theologos Athanaselis, Stelios Bakamidis, Ioannis Dologlou, Evmorfia N. Argyriou, Antonis Symvonis
#2 An Investigation of Depressed Speech Detection: Features and Normalization
Nicholas Cummins, Julien Epps, Michael Breakspear, Roland Goecke
#3 Using Prosodic and Spectral Features in Detecting Depression in Elderly Males
Michelle Hewlett Sanchez, Dimitra Vergyri, Luciana Ferrer, Colleen Richey, Pablo Garcia, Bruce Knoth, William Jarrold
#4 Combining phonological and acoustic ASR-free features for pathological speech intelligibility assessment
Catherine Middag, Tobias Bocklet, Jean-Pierre Martens, Elmar Nöth
#5 Speech Synthesis Parameter Generation for the Assistive Silent Speech Interface MVOCA
Robin Hofe, Stephen R. Ell, Michael J. Fagan, James M. Gilbert, Phil D. Green, Roger K. Moore, Sergey I. Rybchenko
#6 Computer-Assisted Disfluency Counts for Stuttered Speech
Peter A. Heeman, Andy McMillin, J. Scott Yaruss
#7 Spectral Features for Automatic Blind Intelligibility Estimation of Spastic Dysarthric Speech
Richard Hummel, Wai-Yip Chan, Tiago Falk
#8 Extraction of narrative recall patterns for neuropsychological assessment
Emily Prud\'hommeaux, Brian Roark
#9 Gesture Design of Hand-to-Speech Converter derived from Speech-to-Hand Converter based on Probabilistic Integration Model
Aki Kunikoshi, Yu Qiao, Daisuke Saito, Nobuaki Minematsu, Keikichi Hirose
#10 Powered Wheelchair Control Using Acoustic-Based Recognition of Head Gesture Accompanying Speech
Akira Sasou
#11 Analyzing training dependencies and posterior fusion in discriminant classification of apnea patients based on sustained and connected speech
Jose Luis Blanco, Ruben Fernandez, Doroteo Torre, Francisco Javier Caminero, Eduardo Lopez

Sun-Ses3-O1:
Speaker Recognition - Modeling, Automatic Procedures, Analysis I

Time: Sunday 16:00
Place: Auditorium - Pala Congressi
Type: Oral
Chair: Luciano Romito

16:00 Restoring the Residual Speaker Information in Total Variability Modeling for Speaker Verification
Ce Zhang, Rong Zheng, Bo Xu
16:20 New Developments in Joint Factor Analysis for Speaker Verification
Hagai Aronowitz, Oren Barkan
16:40 Speaker recognition using temporal contours in linguistic units: the case of formant and formant-bandwidth trajectories
Joaquin Gonzalez-Rodriguez
17:00 Discriminatively Trained i-vector Extractor for Speaker Verification
Ondrej Glembek, Lukas Burget, Niko Brummer, Oldrich Plchot, Pavel Matejka
17:20 Constrained Cepstral Speaker Recognition Using Matched UBM and JFA Training
Michelle Hewlett Sanchez, Luciana Ferrer, Elizabeth Shriberg, Andreas Stolcke
17:40 A New Perspective on GMM Subspace Compensation Based on PPCA and Wiener Filtering
Alan McCree, Doug Sturim, Doug Reynolds

Sun-Ses3-O3:
Speech Analysis

Time: Sunday 16:00
Place: Brunelleschi (Green Room) - Pala Congressi - 2nd Floor
Type: Oral
Chair: Thomas F. Quatieri

16:00 Adaptive Estimation of Zeros of Time-Varying Z-Transforms
Christian Fischer Pedersen, Ove Andersen, Paul Dalsgaard
16:20 Identifying regions of non-modal phonation using features of the wavelet transform
John Kane, Christer Gobl
16:40 Acoustic Analysis of Whispered Speech for Phoneme and Speaker Dependency
Xing Fan, Keith Godin, John Hansen
17:00 Multi-party Speech Recovery Exploiting Structured Sparsity Models
Afsaneh Asaei, Mohammad Javad Taghizadeh, Hervé Bourlard, Volkan Cevher
17:20 Modulation spectrum analysis for recognition of reverberant speech
Sri Harish Mallidi, Sriram Ganapathy, Hynek Hermansky
17:40 Discrete Choice Models for Non-Intrusive Quality Assessment
Petko N. Petkov, W. Bastiaan Kleijn, Bert de Vries

Sun-Ses3-O2:
Speech Perception - Perceptual Learning and Cross-Language Perception

Time: Sunday 16:00
Place: Leonardo - Pala Affari - Ground Floor
Type: Oral
Chair: Catia Cucchiarini

16:00 Perceptual learning of liquids
Odette Scharenborg, Holger Mitterer, James M. McQueen
16:20 The Efficiency of Cross-dialectal Word Recognition
Annelie Tuinman, Holger Mitterer, Anne Cutler
16:40 Estimation of Perceptual Spaces for Speaker Identities Based on the Cross-Lingual Discrimination Task
Minoru Tsuzaki, Keiichi Tokuda, Hisashi Kawai, Jinfu Ni
17:00 The relation between perception and production in L2 phonological processing
Sharon Peperkamp, Sharon Peperkamp, Camillia Bouchon
17:20 The Role of Word-Initial Glottal Stops in Recognizing English Words
Maria Paola Bissiri, María Luisa Lecumberri, Martin Cooke, Jan Volín
17:40 Effect of language experience on the categorical perception of Cantonese vowel duration
Caicai Zhang, Gang Peng, William S-Y. Wang

Sun-Ses3-O4:
Speech Enhancement and Dereverberation

Time: Sunday 16:00
Place: Michelangelo - Pala Affari - 2nd Floor
Type: Oral
Chair: Peter Kabal

16:00 Single channel dereverberation using example-based speech enhancement with uncertainty decoding technique
Keisuke Kinoshita, Mehrez Souden, Marc Delcroix, Tomohiro Nakatani
16:20 A statistical room impulse response model with frequency dependent reverberation time for single-microphone late reverberation suppression
Jan Erkelens, Richard Heusdens
16:40 An Assessment of the Improvement Potential of Time-Frequency Masking for Speech Dereverberation
Chenxi Zheng, Tiago Falk, Wai-Yip Chan
17:00 Perceptual Improvement of a Two-Stage Algorithm for Speech Dereverberation
Thiago Prego, Amaro de Lima, Sergio Netto
17:20 A Model-Based Spectral Envelope Wiener Filter for Perceptually Motivated Speech Enhancement
Najib Hadir, Friedrich Faubel, Dietrich Klakow
17:40 Binaural Noise-Reduction Method based on Blind Source Separation and Perceptual post processing
Jorge Marin-Hurtado, Devangi Parikh, David Anderson

Sun-Ses3-O5:
ASR - Feature Extraction II

Time: Sunday 16:00
Place: Raffaello - Pala Affari - 3rd Floor
Type: Oral
Chair: Dong Yu

16:00 Region Dependent Transform on MLP Features for Speech Recognition
Tim Ng, Bing Zhang, Spyros Matsoukas, Long Nguyen
16:20 Discriminant Sub-Space Projection of Spectro-Temporal Speech Features based on Maximizing Mutual Information
Martin Heckmann, Claudius Gläser
16:40 Combining feature space discriminative training with long-term spectro-temporal features for noise-robust speech recognition
Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura
17:00 Combining Frame and Segment Level Processing via Temporal Pooling for Phonetic Classification
Sumit Chopra, Patrick Haffner, Dimitrios Dimitriadis
17:20 Improved Bottleneck Features Using Pretrained Deep Neural Networks
Dong Yu, Michael L. Seltzer
17:40 MINIMUM CLASSIFICATION ERROR BASED SPECTRO-TEMPORAL FEATURE EXTRACTION FOR ROBUST AUDIO EVENT CLASSIFICATION
Yuan-Fu Liao

Sun-Ses3-S1-O:
Crowdsourcing for Speech Processing I

Time: Sunday 16:00
Place: Caravaggio (Adua 1) - Pala Affari - 1st Floor
Type: Oral
Chair: Maxine Eskenazi Chairs: Maxine Eskenazi, David Suendermann Chairs: Maxine Eskenazi, David Suendermann, Gina-Anne Levow

16:00 Speaking to the Crowd: looking at past achievements in using crowdsourcing for speech and predicting future challenges
Gabriel Parent, Maxine Eskenazi

Sun-Ses3-P1:
Prosodic Structure

Time: Sunday 16:00
Place: Valfonda 1 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Elizabeth Schriberg

#1 Where should pitch accents and phrase breaks go? A syntax tree transducer solution
Joseph Tepperman, Emily Nava
#2 Phrasal prominences do not need pitch movements: postfocal phrasal heads in Italian
Giuliano Bocci, Cinzia Avesani
#3 Intonation of left dislcated topics in Modern Greek
David Le Gac, Hiyon Yoo
#4 Phrases, pitch and perceived prominence in Māori
Laura Thompson, Catherine I. Watson, Ray Harlow, Jeanette King, Margaret Maclagan, Helen Charters, Peter Keegan
#5 Perceptual sensitivity to prenuclear and nuclear intonational patterns
Tomáš Duběda
#6 Tonal Alignment Defined: the case of Southern Irish English
Raya Kalaldeh
#7 Using Mutual Information to Identify Regions of Analysis for Prosodic Analysis
Andrew Rosenberg
#8 Prosodic highlights in Mandarin continuous speech—Cross-genre attributes and implications
Chiu-yu Tseng, Zhao-yu Su, Chi-Feng Huang
#9 When two newly-acquired words are one: New words differing in stress alone are not automatically represented differently
Simone Sulpizio, James McQueen
#10 Automatic Determination of the Standard Chinese Prosodic Phrase Boundaries by $F_0$ Generation Model
Shehui Bu, Zhenjie Zhuo, Lingling Yang, Shuichi Itahashi
#11 Measuring speakers’ similarity in speech by means of prosodic cues: methods and potential
Celine De Looze, Stephane Rauzy
#12 Tonal Variations in Mandarin: New Evidence from Spontaneous and Read Speech
Li-chiung Yang

Sun-Ses3-P2:
Language Processing

Time: Sunday 16:00
Place: Valfonda 2 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Frederic Bechet

#1 Accounting for prosodic information to improve ASR-based topic tracking for TV Broadcast News
Camille Guinaudeau, Julia Hirschberg
#2 Morpheme Conversion for Connecting Speech Recognizer and Language Analyzers in Unsegmented Languages
Kenji Imamura, Tomoko Izumi, Kugatsu Sadamitsu, Kuniko Saito, Satoshi Kobashikawa, Hirokazu Masataki
#3 Emotion Detection Based on Concept Inference and Spoken Sentence Analysis for Customer Service
Ren-Ying Fang, Bo-Wei Chen, Jhing-Fa Wang, Chung-Hsien Wu
#4 Commas recovery with syntactic features in French and in Czech
Christophe Cerisara, Pavel Král, Claire Gardent
#5 Redundancy Reduction in ASR of Spontaneous Speech through Statistical Machine Translation
Daniele Falavigna
#6 From Interview to News Text : A Study of Taiwan TV Political Interviews in Newspaper Reports
Chin-Chih Chiang

Sun-Ses3-P3:
ASR - language models I

Time: Sunday 16:00
Place: Faenza 1 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Michael Riley

#1 Unary Data Structures for Language Models
Jeffrey Sorensen, Cyril Allauzen
#2 Bayesian Language Model Interpolation for Mobile Speech Input
Cyril Allauzen, Michael Riley
#3 On the Estimation of Discount Parameters for Language Model Smoothing
Martin Sundermeyer, Ralf Schlüter, Hermann Ney
#4 N-grams for Conditional Random Fields or a Failure-transition Posterior for Acyclic FSTs
Patrick Lehnen, Stefan Hahn, Hermann Ney
#5 Hybrid Language Models Using Mixed Types of Sub-lexical Units for Open Vocabulary German LVCSR
M. Ali Basha Shaik, Amr El-Desoky Mousa, Ralf Schlueter, Hermann Ney
#6 Morpheme Based Factored Language Models for German LVCSR
Amr El-Desoky Mousa, M. Ali Basha Shaik, Ralf Schlueter, Hermann Ney
#7 Compound Word Recombination for German LVCSR
Markus Nußbaum-Thom, Amr El-Desoky Mousa, Ralf Schlüter, Hermann Ney
#8 Lattice-Based Risk Minimization Training for Unsupervised Language Model Adaptation
Akio Kobayashi, Takahiro Oku, Shinichi Homma, Toru Imai, Seiichi Nakagawa
#9 Similarity language model
Christian Gillot, Christophe Cerisara
#10 Data Sampling and Dimensionality Reduction Approaches for Reranking ASR Outputs Using Discriminative Language Models
Erinc Dikici, Murat Semerci, Murat Saraclar, Ethem Alpaydin
#11 Training a language model using webdata for large vocabulary Japanese spontaneous speech recognition
Ryo Masumura, Seongjun Hahm, Akinori Ito
#12 Large Vocabulary SOUL Neural Network Language Models
Hai-Son Le, Ilya Oparin, Abdel Messaoudi, Alexandre Allauzen, Jean-Luc Gauvain, Francois Yvon
#13 Improved Spoken Query Transcription using Co-occurrence Information
Jonathan Mamou, Abhinav Sethy, Bhuvana Ramabhadran, Ron Hoory, Paul Vozila
#14 Unsupervised Latent Speaker Language Modeling
Yik-Cheung Tam, Paul Vozila

Sun-Ses3-P4:
Spoken Language Resources, Evaluation and Standardization I

Time: Sunday 16:00
Place: Faenza 2 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Sebastian Moeller

#1 Measurement of Objective Intelligibility of Japanese Accented English Using ERJ (English Read by Japanese) Database
Nobuaki Minematsu, Koji Okabe, Keisuke Ogaki, Keikichi Hirose
#2 From Single-Call to Multi-Call Quality: A Study on Long-term Quality Integration in Audio-Visual Speech Communication
Sebastian Möller, Chihuy Bang, Teele Tamme, Markus Vaalgamaa, Benjamin Weiss
#3 Optimal Selection of Limited Vocabulary Speech Corpora
Hui Lin, Jeff Bilmes
#4 Open Source Multi-Language Audio Database for Spoken Language Processing Applications
Stephen Zahorian, Jiang Wu, Montri Karnjanadecha, Chandra Vootkuri, Brian Wong, Andrew Hwang, Eldar Tokhtamyshev
#5 The USC CARE Corpus: Child-Psychologist Interactions of Children with Autism Spectrum Disorders
Matthew P. Black, Daniel Bone, Marian E. Williams, Phillip Gorrindo, Pat Levitt, Shrikanth S. Narayanan
#6 Towards A Versatile Multi-Layered Description of Speech Corpora Using Algebraic Relations
Nelly Barbot, Vincent Barreaud, Olivier Boeffard, Laure Charonnat, Arnaud Delhay, Sebastien Le Maguer, Damien Lolive
#7 Announcing the Electromagnetic Articulography (Day 1) Subset of the mngu0 Articulatory Corpus
Korin Richmond, Phil Hoole, Simon King
#8 A Pitch Tracking Corpus with Evaluation on Multipitch Tracking Scenario
Gregor Pirker, Michael Wohlmayr, Stefan Petrik, Franz Pernkopf
#9 On building and evaluating a broadcast-news audio segmentation system
Taras Butko, Climent Nadeu
#10 Time- and Acoustic-Mediated Alignment Algorithms for Speech Recognition Evaluation
Simon Dobrišek, France Mihelič
#11 Effects of Shortening Speech Prompts of In-Car Voice User Interfaces on Users\' Mental Models
Julia Niemann, Kati Schulz, Ina Wechsung
#12 Speech Transcript Evaluation for Information Retrieval
Laurens van der Werff, Wessel Kraaij, Franciska de Jong
#13 The Albayzin 2010 Language Recognition Evaluation
Luis Javier Rodriguez-Fuentes, Mikel Penagarikano, Amparo Varona, Mireia Diez, German Bordel
#14 Progress and Prospects for Speech Technology: Results from Three Sexennial Surveys
Roger Moore
#15 Painless WFST cascade construction for LVCSR - Transducersaurus
Josef Robert Novak, Nobuaki Minematsu, Keikichi Hirose

Sun-Ses3-S1-P:
Crowdsourcing for Speech Processing II

Time: Sunday 17:00
Place: Caravaggio (Adua 1) - Pala Affari - 1st Floor
Type: Poster
Chair: Maxine Eskenazi Chairs: Maxine Eskenazi, David Suendermann Chairs: Maxine Eskenazi, David Suendermann, Gina-Anne Levow

#1 A Transcription Task for Crowdsourcing with Automatic Quality Control
Chia-ying Lee, James Glass
#2 Reliability-Weighted Acoustic Model Adaptation Using Crowd Sourced Transcriptions
Kartik Audhkhasi, Panayiotis G. Georgiou, Shrikanth S. Narayanan
#3 Crowdsourcing for word recognition in noise
Martin Cooke, Jon Barker, Maria Luisa Garcia Lecumberri, Krzysztof Wasilewski
#4 Crowdsourcing preference tests, and how to detect cheating
Sabine Buchholz, Javier Latorre
#5 Growing a Spoken Language Interface on Amazon Mechanical Turk
Ian McGraw, James Glass, Stephanie Seneff
#6 Real user evaluation of spoken dialogue systems using Amazon Mechanical Turk
Filip Jurčíček, Simon Keizer, Milica Gasic, Francois Mairesse, Blaise Thomson, Kai Yu, Steve Young
#7 Quality assessment of crowdsourcing transcriptions for African languages
Hadrien Gelas, Solomon Teferra Abate, Laurent Besacier, François Pellegrino
#8 Using crowdsourcing to provide prosodic annotations for non-native speech
Keelan Evanini, Klaus Zechner
#9 PodCastle: Recent Advances of a Spoken Document Retrieval Service Improved by Anonymous User Contributions
Masataka Goto, Jun Ogata

Mon-Ses1-O1:
Speaker Recognition - Modeling, Automatic Procedures, Analysis II

Time: Monday 10:00
Place: Auditorium - Pala Congressi
Type: Oral
Chair: Kornel Laskowski

10:00 Data-driven Gaussian Component Selection for Fast GMM-Based Speaker Verification
Ce Zhang, Rong Zheng, Bo Xu
10:20 Analysis of i-vector Length Normalization in Speaker Recognition Systems
Daniel Garcia-Romero, Carol Y. Espy-Wilson
10:40 An Analysis Framework based on Random Subspace Sampling for Speaker Verification
Weiwu Jiang, Zhifeng Li, Helen Meng
11:00 Factor analysis back ends for MLLR transforms in speaker recognition
Nicolas Scheffer, Yun Lei, Luciana Ferrer
11:20 Report on Performance Results in the NIST 2010 Speaker Recognition Evaluation
Craig S. Greenberg, Alvin F. Martin, Bradford N. Barr, George R. Doddington
11:40 iVector Fusion of Prosodic and Cepstral Features for Speaker Verification
Marcel Kockmann, Luciana Ferrer, Lukas Burget, Jan Cernocky

Mon-Ses1-O3:
Acoustic Event Detection

Time: Monday 10:00
Place: Brunelleschi (Green Room) - Pala Congressi - 2nd Floor
Type: Oral
Chair: Dirk van Compernolle

10:00 Learning new acoustic events in an HMM-based system using MAP adaptation
Jürgen Thomas Geiger, Mohamed Anouar Lakhal, Björn Schuller, Gerhard Rigoll
10:20 Alternative Frequency Scale Cepstral Coefficient for Robust Sound Event Recognition
Yiren Leng, Huy Dat Tran, Norihide Kitaoka, Haizhou Li
10:40 Evaluation of Abnormal Sound Detection using Multi-stage GMM in Various Environments
Akinori Ito, Akihito Aiba, Masashi Ito, Shozo Makino
11:00 Unsupervised learning of acoustic events using dynamic time warping and hierarchical K-means++ clustering
Joerg Schmalenstroeer, Markus Bartek, Reinhold Haeb-Umbach
11:20 Feature Extraction Assessment for an Acoustic-Event Classification Task using the Entropy Triangle
David Mejía-Navarrete, Ascensión Gallardo-Antolín, Carmen Peláez-Moreno, Francisco J. Valverde-Albacete
11:40 Unsupervised Audio Analysis for Categorizing Heterogeneous Consumer Domain Videos
Pradeep Natarajan, Stavros Tsakalidis, Vasant Manohar, Rohit Prasad, Prem Natarajan

Mon-Ses1-O2:
Speech Production - Articulatory Measurements

Time: Monday 10:00
Place: Leonardo - Pala Affari - Ground Floor
Type: Oral
Chair: Paavo Alku

10:00 Visualization of vocal tract shape using interleaved real-time MRI of multiple scan planes
Yoon-Chul Kim, Michael I. Proctor, Shrikanth S. Narayanan, Krishna S. Nayak
10:20 Biomechanical Tongue Models: An Approach to Studying Inter-speaker Variability
Ralf Winkler, Susanne Fuchs, Pascal Perrier, Mark Tiede
10:40 Quantifying Articulatory Distinctiveness of Vowels
Jun Wang, Jordan R. Green, Ashok Samal, David B. Marx
11:00 Direct Estimation of Articulatory Kinematics from Real-time Magnetic Resonance Image Sequences
Michael Proctor, Adam Lammert, Athanasios Katsamanis, Louis Goldstein, Christina Hagedorn, Shrikanth Narayanan
11:20 Combined optical distance sensing and electropalatography to measure articulation
Peter Birkholz, Christiane Neuschaefer-Rube
11:40 Simulating Post-L F0 Bouncing by Modeling Articulatory Dynamics
Santitham Prom-on, Yi Xu, Fang Liu

Mon-Ses1-O4:
Speech Synthesis - Unit Selection and Hybrid approaches

Time: Monday 10:00
Place: Michelangelo - Pala Affari - 2nd Floor
Type: Oral
Chair: Junichi Yamagish

10:00 Enriching text-to-speech synthesis using automatic dialog act tags
Vivek Kumar Rangarajan Sridhar, Alistair Conkie, Ann Syrdal, Srinivas Bangalore
10:20 Joint Target and Join Cost Weight Training for Unit Selection Synthesis
Lukas Latacz, Wesley Mattheyses, Werner Verhelst
10:40 Prominence-Based Prosody Prediction for Unit Selection Speech Synthesis
Andreas Windmann, Igor Jauk, Fabio Tamburini, Petra Wagner
11:00 Evaluating the meaning of synthesized listener vocalizations
Sathish Pammi, Marc Schröder
11:20 A Hybrid TTS Approach for Prosody and Acoustic Modules
Iñaki Sainz, Daniel Erro, Eva Navas, Inma Hernáez
11:40 Uniform Speech Parameterization for Multi-form Segment Synthesis
Alexander Sorin, Slava Shechtman, Vincent Pollet

Mon-Ses1-O5:
Speech Enhancement analysis and Evaluation

Time: Monday 10:00
Place: Raffaello - Pala Affari - 3rd Floor
Type: Oral
Chair: Doug O'Shaughnessy

10:00 Theoretical analysis of musical noise and speech distortion in structure-generalized parametric blind spatial subtraction array
Ryoichi Miyazaki, Hiroshi Saruwatari, Hiroshi Saruwatari, Kiyohiro Shikano, Kiyohiro Shikano
10:20 Subjective and objective evaluation of speech intelligibility enhancement under constant energy and duration constraints
Yan Tang, Martin Cooke
10:40 A Risk-Estimation-Based Comparison of Mean Square Error and Itakura-Saito Distortion Measures for Speech Enhancement
Nagarjuna Reddy Muraka, Chandra Sekhar Seelamantula
11:00 On Noise Tracking for Noise Floor Estimation
Mahdi Triki
11:20 Maximum a posteriori estimation of noise from non-acoustic reference signals in very low signal-to-noise ratio environments
Ben Milner
11:40 Blind speech prior estimation for generalized minimum mean-square error short-time spectral amplitude estimator
Ryo Wakisaka, Hiroshi Saruwatari, Kiyohiro Shikano, Tomoya Takatani

Mon-Ses1-P1:
Paralinguistic Information - Classification and Detection

Time: Monday 10:00
Place: Valfonda 1 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Julia Hirschberg

#1 On the use of multimodal cues for the prediction of degrees of involvement in spontaneous conversation
Catharine Oertel, Stefan Scherer, Nick Campbell
#2 Anger Recognition in Spoken Dialog Using Linguistic and Para-Linguistic Information
Narichika Nomoto, Masafumi Tamoto, Hirokazu Masataki, Osamu Yoshioka, Satoshi Takahashi
#3 Recognition of Personality Traits from Human Spoken Conversations
Alexei V. Ivanov, Giuseppe Riccardi, Adam J. Sporka, Jakub Franc
#4 Using Multiple Databases for Training in Emotion Recognition: To Unite or to Vote?
Björn Schuller, Zixing Zhang, Felix Weninger, Gerhard Rigoll
#5 “Would You Buy A Car From Me?” – On the Likability of Telephone Voices
Felix Burkhardt, Björn Schuller, Benjamin Weiss, Felix Weninger
#6 Automatic Identification of Salient Acoustic Instances in Couples\' Behavioral Interactions using Diverse Density Support Vector Machines
James Gibson, Athanasios Katsamanis, Matthew Black, Shrikanth Narayanan
#7 Predicting Speaker Changes and Listener Responses With And Without Eye-contact
Daniel Neiberg, Joakim Gustafson
#8 Emotion Classification Using Inter- and Intra-Subband Energy Variation
Senaka Amarakeerthi, Tin Lay Nwe, C De Silva Liyanage, Michael Cohen
#9 Emotion Classification of Infants’ Cries using Duration Ratios of Acoustic Segments
Kazuki Kitahara, Shinzi Michiwaki, Miku Sato, Shoichi Matsunaga, Masaru Yamashita, Kazuyuki Shinohara
#10 Vowels formants analysis allows straightforward detection of high arousal acted and spontaneous emotions
Bogdan Vlasenko, Dmytro Prylipko, David Philippou-Hübner, Andreas Wendemuth
#11 Intra-, Inter-, and Cross-cultural Classification of Vocal Affect
Daniel Neiberg, Petri Laukka, Hillary Anger Elfenbein

Mon-Ses1-P2:
Applications for Learning, Education, Aged and Handicapped Persons

Time: Monday 10:00
Place: Valfonda 2 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Roberto Gretter

#1 Verifying Human Users in Speech-Based Interactions
Sajad Shirali-Shahreza, Yashar Ganjali, Ravin Balakrishnan
#2 Automatic Assessment of Prosody in High-Stakes English Tests
Jian Cheng
#3 Improvement of Segmental Mispronunciation Detection with Prior Knowledge Extracted from Large L2 Speech Corpus
Dean Luo, Xuesong Yang, Lan Wang
#4 Off-Topic Detection in Automated Speech Assessment Applications
Jian Cheng, Jianqiang Shen
#5 Towards Context-dependent Phonetic Spelling Error Correction in Children’s Freely Composed Text for Diagnostic and Pedagogical Purposes
Sebastian Stüker, Johanna Fay, Kay Berkling
#6 Factored Translation Models for improving a Speech into Sign Language Translation System
Verónica López-Ludeña, Rubén San-Segundo, Ricardo Cordoba, Javier Ferreiros, Juan Manuel Montero, José Manuel Pardo
#7 Formant maps in Hungarian vowels – online data inventory for research, and education
Kálmán Abari, Zsuzsanna Zsófia Rácz, Gábor Olaszy
#8 Automatic Subtitling of the Basque Parliament Plenary Sessions Videos
Germán Bordel, Slvia Nieto, Mikel Penagarikano, Luis Javier Rodriguez-Fuentes, Amparo Varona
#9 Generating Animated Pronunciation from Speech through Articulatory Feature Extraction
Yurie Iribe, Silasak Manosavanh, Kouichi Katsurada, Ryoko Hayashi, Chunyue Zhu, Tsuneo Nitta
#10 A Tale of Two Tasks: Detecting Children’s Off-Task Speech in a Reading Tutor
Wei Chen, Jack Mostow
#11 The problems encountered by Japanese EL2 with English short vowels as illustrated on the 3D Vowel Chart
Toshiko Isei-Jaakkola, Takatoshi Naka, Keikichi Hirose
#12 Automatic generation of listening comprehension learning material in European Portuguese
Thomas Pellegrini, Rui Correia, Isabel Trancoso, Jorge Baptista, Nuno Mamede
#13 Candidate Generation for ASR Output Error Correction Using a Context-Dependent Syllable Cluster-Based Confusion Matrix
Chao-Hong Liu, Chung-Hsien Wu, David Sarwono, Jhing-Fa Wang
#14 SEMI-SUPERVISED TREE SUPPORT VECTOR MACHINE FOR ONLINE COUGH RECOGNITION
Thai Hoa Huynh, Vu An Tran, Huy Dat Tran

Mon-Ses1-P3:
Robust Speech Recognition I

Time: Monday 10:00
Place: Faenza 1 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Pietro Laface

#1 A versatile Gaussian splitting approach to non-linear state estimation and its application to noise-robust ASR
Volker Leutnant, Alexander Krueger, Reinhold Haeb-Umbach
#2 Generalized-Log Spectral Mean Normalization for Speech Recognition
Hilman Ferdinandus Pardede, Koichi Shinoda
#3 Zero-Crossing-Based Channel Attentive Weighting of Cepstral Features for Robust Speech Recognition: The ETRI 2011 CHiME Challenge System
Young-Ik Kim, Hoon-Young Cho, Sang-Hoon Kim
#4 Feature Compensation for Speech Recognition in Severely Adverse Environments due to Background Noise and Channel Distortion
Wooil Kim, John H. L. Hansen
#5 Binaural cues for fragment-based speech recognition in reverberant multisource environments
Ning Ma, Jon Barker, Heidi Christensen, Phil Green
#6 Sub-band level Histogram Equalization for Robust Speech Recognition
Vikas Joshi, Raghvendra Biligi, Umesh S, Luz Garcia, Carmen Benitez
#7 GMM-based missing-feature reconstruction on multi-frame windows
Ulpu Remes, Yoshihiko Nankaku, Keiichi Tokuda
#8 Improvements of a dual-input DBN for noise robust ASR
Yang Sun, Jort F. Gemmeke, Bert Cranen, Louis ten Bosch, Lou Boves
#9 Denoising Using Optimized Wavelet Filtering for Automatic Speech Recognition
Randy Gomez, Tatsuya Kawahara
#10 Noise Robust Speaker-Independent Speech Recognition with Invariant-Integration Features Using Power-Bias Subtraction
Florian Müller, Alfred Mertins

Mon-Ses1-P4:
ASR - Acoustic Models I

Time: Monday 10:00
Place: Faenza 2 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Lori Lamel

#1 Semi-automatic acoustic model generation from large unsynchronized audio and text chunks
Michele Alessandrini, Giorgio Biagetti, Alessandro Curzi, Claudio Turchetti
#2 Unsupervised Testing Strategies for ASR
Brian Strope, Doug Beeferman, Alexander Gruenstein, Xin Lei
#3 Acoustic Model Training with Detecting Transcription Errors in the Training Data
Gakuto KURATA, Nobuyasu ITOH, Masafumi NISHIMURA
#4 Towards Unsupervised Training of Speaker Independent Acoustic Models
Aren Jansen, Kenneth Church
#5 Acoustic Modeling with Bootstrap and Restructuring Based on Full Covariance
Xiaodong Cui, Xin Chen, Jian Xue, Peder A. Olsen, John R. Hershey, Bowen Zhou
#6 An i-Vector based Approach to Acoustic Sniffing for Irrelevant Variability Normalization based Acoustic Model Training and Speech Recognition
Jian Xu, Yu Zhang, Zhi-Jie Yan, Qiang Huo
#7 Log-linear Optimization of Second-order Polynomial Features with Subsequent Dimension Reduction for Speech Recognition
Muhammad Ali Tahir, Ralf Schlueter, Hermann Ney
#8 Genre Categorization and Modeling for Broadcast Speech Transcription
Qingqing Zhang, Lori Lamel, Jean-Luc Gauvain
#9 Individual Error Minimization Learning Framework and its Applications to Speech Recognition and Utterance Verification
Sunghwan Shin, Ho-Young Jung, Biing-Hwang Juang
#10 Effective Triphone Mapping for Acoustic Modeling in Speech Recognition
Sakhia Darjaa, Miloš Cerňak, Marián Trnka, Milan Rusko, Róbert Sabo
#11 Analysis of Dialectal Influence in Pan-Arabic ASR
Udhyakumar Nallasamy, Michael Garbus, Florian Metze, Qin Jin, Thomas Schaaf, Tanja Schultz
#12 Connected Digit Recognition by Means of Reservoir Computing
Azarakhsh Jalalvand, fabian triefenbach, david verstraeten, jean-pierre martens
#13 Large Margin - Minimum Classification Error Using Sum of Shifted Sigmoids as the Loss Function
Madhavi Ratnagiri, Biing-Hwang Juang, Lawrence Rabiner
#14 Representing Phonological features trough a two-level finite state model
Javier Mikel Olaso, María Inés Torres, Raquel Justo
#15 Optimization of the Gaussian Mixture Model Evaluation on GPU
Jan Vanek, Jan Trmal, Josef V. Psutka, Josef Psutka

Mon-Ses2-O1 :
Speaker Recognition - Analysis and Statistics I

Time: Monday 13:30
Place: Auditorium - Pala Congressi
Type: Oral
Chair: David Van Leeuwen

13:30 Harmonic Structure Transform for Speaker Recognition
Kornel Laskowski, Qin Jin
13:50 Combining Evidence from Spectral and Source-like Features for Person Recognition from Humming
Hemant Patil, Maulik Madhavi, Keshab Parhi
14:10 Improvements in Speaker Characterization Using Spectral Subband Energy Based on Harmonic plus Noise Model
Yanhua Long, Zhi-Jie Yan, Frank K. Soong, Lirong Dai, Wu Guo
14:30 Implicit Segmentation in Two-Wire Speaker Recognition
Yosef Solewicz, Hagai Aronowitz
14:50 Boosting Speaker Recognition Performance with Compact Representations
Sibel Yaman, Jason Pelecanos, Mohamed K. Omar
15:10 Partitioning of Two-Speaker Conversation Datasets
Carlos Vaquero, Alfonso Ortega, Eduardo Lleida

Mon-Ses2-O3:
Speech Segmentation

Time: Monday 13:30
Place: Brunelleschi (Green Room) - Pala Congressi - 2nd Floor
Type: Oral
Chair: Daniele Falavigna

13:30 A Two-stage Sample-based Phone Boundary Detector using Segmental Similarity Features
Yih-Ru Wang
13:50 Iterative Improvement of Speaker Segmentation in A Noisy Environment Using High-level Knowledge
Qiang Huang, Stephen Cox
14:10 Hierarchical Auido Segmentation with HMM and Factor Analysis in Broadcast News Domain
Diego Castan, Carlos Vaquero, Alfonso Ortega, David Martinez, Jesus Villalba, Eduardo Lleida
14:30 Syllable Segmentation of Continuous Speech Using Auditory Attention Cues
Ozlem Kalinli
14:50 Exploiting phone-class specific landmarks for refinement of segment boundaries in TTS databases
Vijayaditya Peddinti, Kishore Prahallad
15:10 Phoneme-Level Text to Audio Synchronization on Speech Signals with Background Music
Agnes Pedone, Juan Jose Burred, Simon Maller, Pierre Leveau

Mon-Ses2-S1:
Show & Tell Demonstration - Speech Systems and Applications

Time: Monday 13:30
Place: Donatello (Room Onice) - Pala Congressi - Ground Floor
Type: Poster
Chair: Dimitrios Dimitriadis

#1 An Affective Spoken Storyteller
Felix Burkhardt
#2 Text Driven 3D Photo-Realistic Talking Head
Lijuan Wang, Frank Soong, Wei Han, Qiang Huo
#3 Physical Models Producing Vowels with Pitch Variation
Arai Takayuki
#4 An Engine-Independent Text-to-Speech Workplace
Margot Mieskes
#5 An application to test the emotion conveyed by vocal and musical signals.
Simone Carcone, Carlo Giovannella
#6 Automatic Speech Recognition System Dedicated for Polish
Mariusz Ziółko,, Jakub Gałka, Bartosz Ziółko, Tomasz Jadczyk, Skurzok Dawid, Mąsior Mariusz
#7 Joint Application of Speech and Speaker Recognition for Automation and Security in Smart Home
Kong Aik Lee, Anthony Larcher, Helen Thai, Bin Ma, Haizhou Li
#8 Adding a Speech Cursor to a Multimodal Dialogue System
Staffan Larsson, Alexander Berman, Jessica Villing
#9 Prosody Toolkit: Integrating HTK, Praat and WEKA
Scott Thomas Christie, Serguei Pakhomov
#10 Collecting life logs for experience-based corpora
Fabiano Francesconi, Arindam Ghosh, Giuseppe Riccardi, Marco Ronchetti, Alex Vagin

Mon-Ses2-O2:
Speech Production - Coarticulation and Speech Timing

Time: Monday 13:30
Place: Leonardo - Pala Affari - Ground Floor
Type: Oral
Chair: Wim van Dommelen

13:30 Jaw movement in vowels and liquids forming the syllable nucleus
Štefan Beňuš, Marianne Pouplier
13:50 Coarticulation across prosodic domains in Italian: An ultrasound investigation
Barbara Gili Fivela, Antonio Stella, Sonia D\'Apolito, Francesco Sigona
14:10 Investigating the stability of intergestural timing relations
Juraj Simko, Fred Cummins, Štefan Beňuš
14:30 Speech timing organization for the phonological length contrast in Italian consonants
Claudio Zmarich, Barbara Gili Fivela, Pascal Perrier, Christophe Savariaux, Graziano Tisato
14:50 Timing in Italian VNC sequences at different speech rates
Chiara Celata, Silvia Calamai
15:10 Automatic Analysis of Singleton and Geminate Consonant Articulation Using Real-time Magnetic Resonance Imaging
Christina Hagedorn, Michael Proctor, Louis Goldstein

Mon-Ses2-O4:
ASR - Acoustic Models II

Time: Monday 13:30
Place: Michelangelo - Pala Affari - 2nd Floor
Type: Oral
Chair: Frank Seide

13:30 Conversational Speech Transcription Using Context-Dependent Deep Neural Networks
Frank Seide, Gang Li, Dong Yu
13:50 Sequential Classification Criteria for NNs in Automatic Speech Recognition
Guangsen Wang, Khe Chai Sim
14:10 GRAPHEME-BASED AUTOMATIC SPEECH RECOGNITION USING KL-HMM
Mathew Magimai.-Doss, Ramya Rasipuram, Guillermo Aradilla, Herve Bourlard
14:30 Direct Error Rate Minimization of Hidden Markov Models
Joseph Keshet, Chih-Chieh Cheng, Mark Stoehr, David McAllester
14:50 On the Effectiveness of Statistical Modeling based Template Matching Approach for Continuous Speech Recognition
Xie Sun, Xin Chen, Yunxin Zhao
15:10 Comparison of Smoothing Techniques for Robust Context Dependent Acoustic Modelling in Hybrid NN/HMM Systems
Guangsen Wang, Khe Chai Sim

Mon-Ses2-O5:
Robust Speech Recognition II

Time: Monday 13:30
Place: Raffaello - Pala Affari - 3rd Floor
Type: Oral
Chair: Maurizio Omologo

13:30 Propagation of Uncertainty through Multilayer Perceptrons for Robust Automatic Speech Recognition
Ramón Fernandez Astudillo, Joao Paulo da Silva Neto
13:50 Mapping Sparse Representation to State Likelihoods in Noise-Robust Automatic Speech Recognition
Katariina Mahkonen, Antti Hurmalainen, Tuomas Virtanen, Jort Gemmeke
14:10 Uncertainty measures for improving exemplar-based source separation
Heikki Kallasjoki, Ulpu Remes, Jort F. Gemmeke, Tuomas Virtanen, Kalle J. Palomäki
14:30 Maximum Confidence Measure Based Interaural Phase Difference Estimation for Noise Masking in Dual-Microphone Robust Speech Recognition
Hsien-Cheng Liao, Yuan-Fu Liao, Chin-Hui Lee
14:50 A Performance Monitoring Approach to Fusing Enhanced Spectrogram Channels in Robust Speech Recognition
Shirin Badiezadegan, Richard Rose
15:10 Generalized Variable Parameter HMMs for Noise Robust Speech Recognition
Ning Cheng, Xunying Liu, Lan Wang

Mon-Ses2-P1:
Source Separation and Speech Enhancement

Time: Monday 13:30
Place: Valfonda 1 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Marco Matassoni

#1 Monaural Voiced Speech Segregation Based on Pitch and Comb Filter
Xueliang Zhang, Wenju Liu
#2 Fast and simple iterative algorithm of Lp-norm minimization for under-determined speech separation
Yasuharu Hirasawa, Naoki Yasuraoka, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno
#3 Monaural Speech Separation Based on a 2D Processing and Harmonic Analysis
Azam Rabiee, Saeed Setayeshi, Soo-Young Lee
#4 Underdetermined Blind Source Separation with Fuzzy Clustering for Arbitrarily Arranged Sensors
Ingrid Jafari, Serajul Haque, Roberto Togneri, Sven Nordholm
#5 On Initial Seed Selection for Frequency Domain Blind Speech Separation
Dang Hai Tran Vu, Reinhold Haeb-Umbach
#6 Spatial filter calibration based on minimization of modified LSD
Nobuaki Tanaka, Tetsuji Ogawa, Tetsunori Kobayashi
#7 Probabilistic Spectrum Envelope: Categorized Audio-features Representation for NMF-based Sound Decomposition
Toru Nakashika, Tetsuya Takiguchi, Yasuo Ariki
#8 A high resolution multiple source localization by generalized cumulant structure (GCS) matrix
Jinho Choi, Chang D. Yoo
#9 Single channel speech music separation using nonnegative matrix factorization with sliding window and spectral masks
Emad M. Grais, Hakan Erdogan
#10 Perceptually-inspired Processing for Multichannel Wiener Filter
Jorge I. Marin, David V. Anderson
#11 Speech recognition in mixed sound of speech and music based on vector quantization and non-negative matrix factorization
Shoichi Nakano, Kazumasa Yamamoto, Seiichi Nakagawa
#12 Reduction of Highly Nonstationary Ambient Noise by Integrating Spectral and Locational Characteristics of Speech and Noise
Tomohiro Nakatani, Shoko Araki, Marc Delcroix, Takuya Yoshioka, Masakiyo Fujimoto
#13 Voice processing by dynamic glottal models with applications to speech enhancement
Carlo Drioli, Andrea Calanca
#14 Supervised Sparse Coding Strategy in Cochlear Implants
Jinqiu Sang, Guoping Li, Hongmei Hu, Mark E Lutman, Stefan Bleeck

Mon-Ses2-P2:
HMM-based Speech Synthesis II

Time: Monday 13:30
Place: Valfonda 2 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Tomoki Toda

#1 Continuous Control of the Degree of Articulation in HMM-based Speech Synthesis
Benjamin Picart, Thomas Drugman, Thierry Dutoit
#2 Estimation of Window Coefficients for Dynamic Feature Extraction for HMM based Speech Synthesis
Ling-Hui Chen, Yoshihiko Nankaku, Heiga Zen, Keiichi Tokuda, Zhen-Hua Ling, Li-Rong Dai
#3 Inverse Filtering Based Harmonic plus Noise Excitation Model for HMM-based Speech Synthesis
Zhengqi Wen, Jianhua Tao
#4 Improved HNM-based Vocoder for Statistical Synthesizers
Daniel Erro, Iñaki Sainz, Eva Navas, Inma Hernaez
#5 A Statistical Phrase/Accent Model for Intonation Modeling
Gopala Krishna Anumanchipalli, Luís C. Oliveira, Alan W Black
#6 Intermediate-State HMMs to Capture Continuously-Changing Signal Features
Gustav Eje Henter, W. Bastiaan Kleijn
#7 Automatic sentence selection from speech corpora including diverse speech for improved HMM-TTS synthesis quality
Norbert Braunschweiler, Sabine Buchholz
#8 Phonological Knowledge Guided HMM State Mapping for Cross-Lingual Speaker Adaptation
Hui Liang, John Dines
#9 Reformulating Prosodic Break Model into Segmental HMMs and Information Fusion
Nicolas Obin, Pierre Lanchantin, Anne Lacheret, Xavier Rodet
#9 Multipulse Sequences for Residual Signal Modeling
Ranniery Maia, Heiga Zen, Kate Knill, Mark Gales, Sabine Buchholz
#10 Can Objective Measures Predict the Intelligibility of Modified HMM-based Synthetic Speech in Noise?
Cassia Valentini-Botinhao, Junichi Yamagishi, Simon King
#11 Speech Synthesis based on Articulatory-Movement HMMs with Voice-source Codebook
Tsuneo Nitta, Takayuki Onoda, Masashi Kimura, Yurie Iribe, Kouichi Katsurada
#12 Large-scale Subjective Evaluations of Speech Rate Control Methods for HMM-based Speech Synthesizers
Tsuneo Kato, Makoto Yamada, Nobuyuki Nishizawa, Keiichiro Oura, Keiichi Tokuda
#13 HMM-Based Emphatic Speech Synthesis Using Unsupervised Context Labeling
Yu Maeno, Takashi Nose, Takao Kobayashi, Yusuke Ijima, Hideharu Nakajima, Hideyuki Mizuno, Osamu Yoshioka

Mon-Ses2-P3:
Phonetics and Phonology, Stress, Accent, Rhythm

Time: Monday 13:30
Place: Faenza 1 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Bernd Möbius

#1 Chinese and Italian Speech Rhythm. Normalization and the CCI Algorithm.
Chiara Bertini, Pier Marco Bertinetto, Na Zhi
#2 Rhythm metrics on syllables and feet do not work as expected
Paolo Mairano, Antonio Romano
#3 Applying Rhythm Features to Automatically Assess Non-Native Speech
Lei Chen, Klaus Zechner
#4 Prosodic Synchrony in Co-operative Task-based Dialogues: A Measure of Agreement and Disagreement
Brian Vaughan
#5 Low and High, Short and Long by Crook or by Hook?
Oliver Niebuhr, Astrid Wolf
#6 Estimating Speaking Rate by Means of Rhythmicity Parameters
Christian Heinrich, Florian Schiel
#7 Comparing word and syllable prominence rated by naive listeners
Denis Arnold, Bernd Möbius, Petra Wagner
#8 L1 / L2 perception of lexical stress with F0 peak-delay: effect of an extra syllable added
Shinichi Tokuma, Yi Xu
#9 Letter-to-Phoneme Conversion based on Two-Stage Neural Network focusing on Letter and Phoneme Contexts
Seng Kheang, Iribe Yurie, Nitta Tsuneo
#10 An international English speech corpus for longitudinal study of accent development
Rosemary Orr, Hugo Quene, Roeland van Beek, Thari Diefenbach, David van Leeuwen, Marijn Huijbregts
#11 A CORPUS-BASED STUDY OF ENGLISH PRONUNCIATION VARIATIONS
Sunhee Kim, Kyuwhan Lee, Minhwa Chung
#12 Long term average speech spectra in Yolngu Matha and Pitjantjatjara speaking females and males
Hywel Stoakes, Andrew Butcher, Janet Fletcher, Marija Tabain
#13 Context and speaker dependency in the relation of vowel formants and subglottal resonances – Evidence from Hungarian
Tekla Etelka Gráczi, Steven M. Lulich, Tamás Gábor Csapó, András Beke

Mon-Ses2-P4:
ASR - Search, Keyword Spotting and Confidence Measures I

Time: Monday 13:30
Place: Faenza 2 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Mark Gales

#1 Event Selection from Phone Posteriorgrams Using Matched Filters
Keith Kintzley, Aren Jansen, Hynek Hermansky
#2 A Piecewise Aggregate Approximation Lower-Bound Estimate for Posteriorgram-based Dynamic Time Warping
Yaodong Zhang, James Glass
#3 OOV Detection and Recovery using Hybrid Models with Different Fragments
Long Qin, Ming Sun, Alexander Rudnicky
#4 AUC Optimization Based Confidence Measure for Keyword Spotting
Haiyang Li, Jiqing Han, Tieran Zheng
#5 An Empirical Study of Multilingual Spoken Term Detection
Zejun Ma, Xiaorui Wang, Bo Xu
#6 Fusing Multiple Confidence Measures for Chinese Spoken Term Detection
Zejun Ma, Xiaorui Wang, Bo Xu
#7 Response Probability Based Decoding Algorithm for Large Vocabulary Continuous Speech Recognition
Zhanlei Yang, Hao Chao, Wenju Liu
#8 Combining Lattice-Based Language Dependent and Independent Approaches for Out-of-Language Detection in LVCSR
Yuxiang Shan, Yan Deng, Jia Liu
#9 Evaluation of tree-trellis based decoding in over-million LVCSR
Naoaki Ito, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
#10 Lattice Based Discriminative Model Combination Using Automatically Induced Phonetic Contexts
Hao Huang, Bing Hu Li
#11 Predicting Human Perceived Accuracy of ASR Systems
Taniya Mishra, Andrej Ljolje, Mazin Gilbert
#12 Cross-lingual study of ASR errors: on the role of the context in human perception of near homophones
Ioana Vasilescu, Dahbia Yahia, Natalie Snoeren, Martine Adda-Decker, Lori Lamel
#13 Performance Prediction of Speech Recognition Using Average-Voice-Based Speech Synthesis
Tatsuhiko Saito, Takashi Nose, Takao Kobayashi, Yohei Okato
#14 Confidence Measures For Turkish Call Center Conversations
Ali Haznedaroglu, Levent M. Arslan
#15 Spoken Document Confidence Estimation Using Contextual Coherence
Taichi Asami, Narichika Nomoto, Satoshi Kobashikawa, Yoshikazu Yamaguchi, Hirokazu Masataki, Satoshi Takahashi

Mon-Ses3-O1:
Speaker Recognition - Analysis and Statistics II

Time: Monday 16:00
Place: Auditorium - Pala Congressi
Type: Oral
Chair: Mohammed Senoussaoui

16:00 Intersession compensation and scoring methods in the i-vectors space for speaker recognition
Pierre-Michel Bousquet, Driss Matrouf, Jean-François Bonastre
16:20 Kernel alignment maximization for speaker recognition based on high-level features
Szymon Drgas, Adam Dabrowski
16:40 Kernel partial least squares for speaker recognition
Balaji Vasan Srinivasan, Daniel Garcia-Romero, Dmitry N. Zotkin, Ramani Duraiswami
17:00 Conversational-Side-Specific Inter-Session Variability Compensation
Mohamed Omar, Jason Pelecanos
17:20 A speaker line-up for the Likelihood Ratio
David Van Leeuwen, Niko Brümmer
17:40 Towards Fully Bayesian Speaker Recognition: Integrating Out the Between-Speaker Covariance
Jesús Antonio Villalba López, Niko Brümmer

Mon-Ses3-O3:
ASR - Lexical, Prosodic and Multi-Lingual Models

Time: Monday 16:00
Place: Brunelleschi (Green Room) - Pala Congressi - 2nd Floor
Type: Oral
Chair: Murat Saraclar

16:00 Learning from Mistakes: Expanding Pronunciation Lexicons using Word Recognition Errors
Sravana Reddy, Evandro Gouvea
16:20 Improving non-native ASR through stochastic multilingual phoneme space transformations
David Imseng, Hervé Bourlard, John Dines, Philip N. Garner, Mathew Magimai Doss
16:40 Unsupervised Arabic Dialect Adaptation with Self-Training
Scott Novotney, Rich Schwartz, Sanjeev Khudanpur
17:00 Template-based Automatic Speech Recognition meets Prosody
Dino Seppi, Kris Demuynck, Dirk Van Compernolle
17:20 Pronunciation Learning from Continuous Speech
Ibrahim Badr, Ian McGraw, James Glass
17:40 State-Level Data Borrowing for Low-Resource Speech Recognition based on Subspace GMMs
Yanmin Qian, Daniel Povey, Jia Liu

Mon-Ses3-P5:
Speech Synthesis - Selected Topics

Time: Monday 16:00
Place: Donatello (Room Onice) - Pala Congressi - Ground Floor
Type: Poster
Chair: Enrico Zovato

#1 A Grammar Based Approach to Style Specific Phrase Prediction
Alok Parlikar, Alan W Black
#2 Unsupervised features from text for speech synthesis in a speech-to-speech translation system
Oliver Watts, Bowen Zhou
#3 Unsupervised continuous-valued word features for phrase-break prediction without a part-of-speech tagger
Oliver Watts, Junichi Yamagishi, Simon King
#4 Albayzín 2010: a Spanish text to speech evaluation
Francisco Campillo, Francisco Méndez, Montserrat Arza, Laura Docío, Antonio Bonafonte, Eva Navas, Iñaki Sainz
#5 Combining Active and Semi-supervised Learning for Homograph Disambiguation in Mandarin Text-to-Speech Synthesis
Binbin Shen, Zhiyong Wu, Yongxin Wang, Lianhong Cai
#6 Automatically Creating a Diphone Set from a Speech Database
Thomas Ewender, Beat Pfister
#7 Automatic Viseme Clustering for Audiovisual Speech Synthesis
Wesley Mattheyses, Lukas Latacz, Werner Verhelst
#8 Perceptual Quality Dimensions of Text-to-Speech Systems
Florian Hinterleitner, Sebastian Möller, Christoph Norrenbrock, Ulrich Heute
#10 A Pointwise Approach to Pronunciation Estimation for a TTS Front-end
Shinsuke Mori, Graham Neubig
#11 Correlating Text with Prosody
Mohamed Abou-Zleikha, Julie Carson-Berndsen
#12 ``What is... Dengue Fever?\'\' Modeling and Predicting Pronunciation Errors in a Text-to-Speech System
Andrew Rosenberg, Raul Fernandez, Bhuvana Ramabhadran
#13 Aperiodicity Analysis for Quality Estimation of Text-To-Speech Signals
Christoph Norrenbrock, Ulrich Heute, Florian Hinterleitner, Sebastian Möller

Mon-Ses3-O2:
Physiology and Pathology of Spoken Language

Time: Monday 16:00
Place: Leonardo - Pala Affari - Ground Floor
Type: Oral
Chair: Tim Bunnell

16:00 Novel VTEO Based Mel Cepstral Features for Classification of Normal and Pathological Voices
Hemant Patil, Pallavi Baljekar
16:20 Temporal Performance of Dysarthric Patients in Speech and Tapping Tasks
Eiji Shimura, Kazuhiko Kakehi
16:40 A comparative acoustic study on speech of glossectomy patients and normal subjects
Xinhui Zhou, Maureen Stone, Carol Espy-Wilson
17:00 Dysperiodicity analysis of perceptually assessed synthetic stimuli
Ali Alpan, Francis Grenez, Jean Schoentgen
17:20 Is the perception of voice quality language-dependant? A comparison of French and Italian listeners and dysphonic speakers
Alain Ghio, Frédérique Weisz, Giovanna Baracca, Giovanna Cantarella, Danièle Robert, Virginie Woisard, Franco Fussi, Antoine Giovanni
17:40 Automatic Selection of Acoustic and Non-linear Dynamic Features in Voice Signals for Hypernasality Detection
Juan Rafael Orozco, Santiago Murillo, Andres Marino Alvarez, Julian David Arias, Edilson Delgado, Jesus Francisco Vargas, Cesar German Castellanos

Mon-Ses3-O4:
Source Separation

Time: Monday 16:00
Place: Michelangelo - Pala Affari - 2nd Floor
Type: Oral
Chair: Tomohiro Nakatani

16:00 FREQUENCY ORIENTED PCA FOR BLIND SPEECH SEPARATION OF CONVOLUTIVE MIXTURES IN MULTIPLE ENVIRONMENTS
Yasmina Benabderrahmane, Sid Ahmed Selouani, Douglas O\'Shaughnessy
16:20 Blind Speech Separation in Time-Domain Using Block-Toeplitz Structure of Reconstructed Signal Matrices
Zbynek Koldovsky, Petr Tichavsky, Jiri Malek
16:40 Generalized method for solving the permutation problem in frequency-domain blind source separation of convolved speech signals
Auxiliadora Sarmiento, Iván Durán, Sergio Cruces, Pablo Aguilera
17:00 Adaptation of speaker-specific bases in non-negative matrix factorization for single channel speech-music separation
Emad M. Grais, Hakan Erdogan
17:20 An Informed Source Separation System for Speech Signals
Shuhua Zhang, Laurent Girin
17:40 Adaptive Blocking Beamforming for Speech Separation
Ngoc Thuy Tran, William Cowley, Andre Pollok

Mon-Ses3-O5:
Multimodal Signal Processing

Time: Monday 16:00
Place: Raffaello - Pala Affari - 3rd Floor
Type: Oral
Chair: Keikichi Hirose

16:00 Asynchronous Multimodal Text Entry using Speech and Gesture Keyboards
Per Ola Kristensson, Keith Vertanen
16:20 ROBUST BIMODAL PERSON IDENTIFICATION USING FACE AND SPEECH WITH LIMITED TRAINING DATA AND CORRUPTION OF BOTH MODALITIES
Niall McLaughlin, Ji Ming, Danny Crookes
16:40 Toward a multi-speaker visual articulatory feedback system
Atef Ben Youssef, Thomas Hueber, Pierre Badin, Gérard Bailly
17:00 Statistical Mapping between Articulatory and Acoustic Data for an Ultrasound-based Silent Speech Interface
Thomas Hueber, Elie-Laurent Benaroya, Bruce Denby, Gérard Chollet
17:20 Unsupervised geometry calibration of acoustic sensor networks using source correspondences
Joerg Schmalenstroeer, Florian Jacob, Reinhold Haeb-Umbach, Marius H. Hennecke, Gernot A. Fink
17:40 Investigations on Speaking Mode Discrepancies in EMG-based Speech Recognition
Michael Wand, Matthias Janke, Tanja Schultz

Mon-Ses3-P1:
Pitch Processing - Singing Voice Analysis

Time: Monday 16:00
Place: Valfonda 1 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Thomas Drugman

#1 Fundamental Frequency Estimation Using Modified Higher Order Moments And Multiple Windows
Alipah Pawi, Saeed Vaseghi, Ben Milner, Seyed Ghorshi
#2 EM-based Gain Adaptation for Probabilistic Multipitch Tracking
Michael Wohlmayr, Franz Pernkopf
#3 Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics
Thomas Drugman, Abeer Alwan
#4 Epoch Extraction in High Pass Filtered Speech using Hilbert Envelope
Govind D, Prasanna S R Mahadeva, Debadatta Pati
#5 Robust HNR-based Closed-loop Pitch and Harmonic Parameters Estimation
Alexander Pavlovets, Alexander Petrovsky
#6 Exploring Bessel Features for Detection of Glottal Closure Instants
Chetana Prakash, Dhananjaya Nagaraje Gowda, Suryakanth V. Gangashetty
#7 Evaluation of Glottal Epoch Detection Algorithms on Different Voice Types
Joao Paulo Cabral, John Kane, Christer Gobl, Julie Carson-Berndsen
#8 A divide et impera algorithm for optimal pitch stylization
Antonio Origlia, Giovanni Abete, Francesco Cutugno, Iolanda Alfano, Renata Savy, Bogdan Ludusan
#9 Singing Voice Analysis Using Relative Harmonic Delays
Ricardo Sousa, Aníbal Ferreira
#10 Singing voice synthesis: Singer-dependent vibrato modeling and coherent processing of spectral envelope
Siu Wa Lee, Minghui Dong
#11 Chorus Digitalis: experiments in chironomic choir singing
Sylvain Le Beux, Lionel Feugère, Christophe d\'Alessandro

Mon-Ses3-P2:
Prosodic Modeling

Time: Monday 16:00
Place: Valfonda 2 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Hiroya Fujisaki

#1 Prominence Model for Prosodic Features in Automatic Lexical Stress and Pitch Accent Detection
Kun Li, Shuang Zhang, Mingxing Li, Wai-Kit Lo, Helen Meng
#2 Hierarchical Stress Modeling in Mandarin Text-to-Speech
Ya Li, Jianhua Tao, Xiaoying XU
#3 Automatic Prosodic Events Detection by Using Syllable-based Acoustic, Lexical and Syntactic Features
Chong-Jia Ni, Wen-Ju Liu, Bo Xu
#4 Using Dynamic Time Warping to compute prosodic similarity measures
Albert Rilliard, Alexandre Allauzen, Philippe Boula de Mareüil
#5 Applying the quantitative target approximation model (qTA) to German and Brazilian Portuguese
Plinio Barbosa, Hansjörg Mixdorff, Sandra Madureira
#6 Stylization and Trajectory Modelling of Short and Long Term Speech Prosody Variations
Nicolas Obin, Anne Lacheret, Xavier Rodet
#7 Toward a Continuous Modeling of French Prosodic Structure: Using Acoustic Features to Predict Prominence Location and Prominence Degree
Mathieu Avanzi, Nicolas Obin, Anne Lacheret, Bernard Victorri
#8 Optimal models of prosodic prominence using the Bayesian information criterion
Tim Mahrt, Jui-Ting Huang, Yoonsook Mo, Margaret Fleck, Mark Hasegawa-Johnson, Jennifer Cole
#9 Quantitative Analysis of Tone Coarticulation in Mandarin
Hussein Hussein, Hansjörg Mixdorff
#10 Tracking pitch contours using minimum jerk trajectories
Daniel Neiberg, G Ananthakrishnan, Joakim Gustafson

Mon-Ses3-P3:
Discourse and Dialogue

Time: Monday 16:00
Place: Faenza 1 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Patrick Ehlen

#1 On the use of linguistic features in an automatic system for speech analytics of telephone conversations
Benjamin Maza, Marc El-Beze, Georges Linares, Renato De Mori
#2 Determining What Questions To Ask, with the Help of Spectral Graph Theory
Abe Kazemzadeh, Sungbok Lee, Panayiotis Georgiou, Shrikanth Narayanan
#3 \'Are you sure you\'re paying attention?\' -- \'Uh-huh\'. Communicating understanding as a marker of attentiveness
Hendrik Buschmeier, Zofia Malisz, Marcin Wlodarczak, Stefan Kopp, Petra Wagner
#4 Projectability of Transition-relevance Places using Prosodic Features in Japanese Spontaneous Conversation
Yuichi Ishimoto, Mika Enomoto, Hitoshi Iida
#5 Measuring Final Lengthening for Speaker-Change Prediction
Anna Hjalmarsson, Kornel Laskowski
#6 Incremental Learning and Forgetting in Stochastic Turn-Taking Models
Kornel Laskowski, Jens Edlund, Mattias Heldner
#7 Reinforcement Learning of Argumentation Dialogue Policies in Negotiation
Kallirroi Georgila, David Traum
#8 Topic Switching Strategies for Spoken Dialogue Systems
Tobias Heinroth, Savina Koleva, Wolfgang Minker
#9 Unsupervised Clustering of Utterances using Non-parametric Bayesian Methods
Ryuichiro Higashinaka, Noriaki Kawamae, Kugatsu Sadamitsu, Yasuhiro Minami, Toyomi Meguro, Kohji Dohsaka, Hirohito Inagaki

Mon-Ses3-P4:
SLP for Speech Translation, Information Extraction and Retrieval

Time: Monday 16:00
Place: Faenza 2 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Dekai Wu

#1 OOV Sensitive Named-Entity Recognition in Speech
Carolina Parada, Frederick Jelinek
#2 Speech Translation with Grammar Driven Probabilistic Phrasal Bilexica Extraction
Markus Saers, Dekai Wu, Chi-Kiu Lo, Karteek Addanki
#3 An Efficient Unified Extraction Algorithm for Bilingual Data
Christoph Tillmann, Sanjika Hewavitharana
#4 Using Features from Topic Models to Alleviate Over-generation in Hierarchical Phrase-based Translation
Songfang Huang, Bowen Zhou
#5 An Empirical Study on Improving Hierarchical Phrase-based Translation Using Alignment Features
Songfang Huang, Bowen Zhou
#6 Robust Speech Translation by Domain Adaptation
Xiaodong He, Li Deng
#7 Enhancements to the Training Process of Classifier-based Speech Translator via Topic Modeling
Emil Ettelaie, Panayiotis G. Georgiou, Shrikanth S. Narayanan
#8 A scalable approach for building a parallel corpus from the Web
Vivek Kumar Rangarajan Sridhar, Luciano Barbosa, Srinivas Bangalore
#9 Spoken Term Detection Results using Plural Subword Models by Estimating Detection Performance for Each Query
Yoshiaki Itoh, Kohei Iwata, Ishigame Masaaki, Kazuyo Tanaka, Shi-wook Lee
#10 SpeechForms - From Web to Speech and Back
Luciano Barbosa, Diamantino Caseiro, Giuseppe Di Fabbrizio, Amanda Stent
#11 Image Processing Filters for Line Detection-based Spoken Term Detection
Kazuyuki Noritake, Hiroaki Nanjo, Takehiko Yoshimi
#12 Using Latent Topic Features for Named Entity Extraction in Search Queries
Joseph Polifroni, Francois Mairesse
#13 Language model expansion using webdata for spoken document retrieval
Ryo Masumura, Seongjun Hahm, Akinori Ito, Akinori Ito
#14 Effects of Query Expansion for Spoken Document Passage Retrieval
Tomoyosi Akiba, Koichiro Honda
#15 Unsupervised Hidden Markov Modeling of Spoken Queries for Spoken Term Detection without Speech Recognition
Chun-an Chan, Lin-shan Lee
#16 Topic Identification from Audio Recordings using Rich Recognition Results and Neural Network based Classifiers
Roberto Gemello, Franco Mana, Pier Domenico Batzu

Tue-Ses1-O1:
ASR - language models II

Time: Tuesday 10:00
Place: Auditorium - Pala Congressi
Type: Oral
Chair: Stephan Kanthak

10:00 Empirical Evaluation and Combination of Advanced Language Modeling Techniques
Tomas Mikolov, Anoop Deoras, Stefan Kombrink, Lukas Burget, Jan Cernocky
10:20 Personalizing Model M for Voice-search
Geoffrey Zweig, Shuangyu Chang
10:40 Sentence Selection by Direct Likelihood Maximization for Language Model Adaptation
Takahiro Shinozaki, Yu Kubota, Sadaoki Furui, Eiji Utsunomiya, Yasutaka Shindoh
11:00 Feature Combination Approaches for Discriminative Language Models
Ebru Arisoy, Bhuvana Ramabhadran, Hong-Kwang Jeff Kuo
11:20 On-line Language Model Biasing for Multi-Pass Automatic Speech Recognition
Sankaranarayanan Ananthakrishnan, Stavros Tsakalidis, Rohit Prasad, Prem Natarajan
11:40 Mandarin word-character hybrid-input Neural Network Language Model
Moonyoung Kang, Tim Ng, Long Nguyen

Tue-Ses1-O3:
Voice Conversion

Time: Tuesday 10:00
Place: Brunelleschi (Green Room) - Pala Congressi - 2nd Floor
Type: Oral
Chair: Junichi Yamagishi

10:00 One-to-Many Voice Conversion Based on Tensor Representation of Speaker Space
Daisuke Saito, Keisuke Yamamoto, Nobuaki Minematsu, Keikichi Hirose
10:20 A Study on Bag of Gaussian Model with Application to Voice Conversion
Yu Qiao, Tong Tong, Nobuaki Minematsu
10:40 A Bayesian Approach to Voice Conversion Based on GMMs Using Multiple Model Structures
Lei Li, Yoshihiko Nankaku, Keiichi Tokuda
11:00 Quality Improvement of Voice Conversion Systems Based on Trellis Structured VQ
Mahdi Eslami, Hamid Sheikhzadeh, Abolghasem Sayadiyan
11:20 Voice Conversion using GMM with Enhanced Global Variance
Hadas Benisty, David Malah
11:40 Spectral Envelope Transformation using DFW and Amplitude Scaling for Voice Conversion with Parallel or Nonparallel Corpora
Elizabeth Godoy, Olivier Rosec, Thierry Chonavel

Tue-Ses1-P5:
Speech Audio Analysis and Classification

Time: Tuesday 10:00
Place: Donatello (Room Onice) - Pala Congressi - Ground Floor
Type: Poster
Chair: Olivier Rosec

#1 Stop Consonant Recognition by Temporal Fine Structure of Burst
Seppo Fagerlund, Unto K. Laine
#2 Phonetic Classification Using Controlled Random Walks
Katrin Kirchhoff, Andrei Alexandrescu
#3 Keyphrase Cloud Generation of Broadcast News
Luís Marujo, Márcio Viveiros, João P. Neto
#4 Optimized Feature Extraction and HMMs in Subword Detectors
Alfonso M. Canterla, Magne H. Johnsen
#5 Real-World Speech/Non-Speech Audio Classification Based on Sparse Representation Features and GPCs
Ziqiang Shi, Jiqing Han, Tieran Zheng
#6 Privacy Preserving Speaker Verification using Adapted GMMs
Manas Pathak, Bhiksha Raj
#7 Clustering Expressive Speech Styles in Audiobooks Using Glottal Source Parameters
Eva Szekely, Joao Cabral, Peter Cahill, Julie Carson-Berndsen
#8 On the use of the rhythmogram for automatic syllabic prominence detection
Bogdan Ludusan, Antonio Origlia, Francesco Cutugno
#9 Speech Modulation Features for Robust Nonnative Speech Accent Detection
Sethserey Sam, Xiong Xiao, Laurent Besacier, Eric Castelli, Haizhou Li, Eng Siong Chng
#10 Frame-Level Vocal Effort Likelihood Space Modeling for Improved Whisper-Island Detection
Chi Zhang, John Hansen
#11 Speaker Identification for Whispered Speech Using A Training Feature Transformation From Neutral To Whisper
Xing Fan, John Hansen
#12 An Accurate and Robust Gender Identification Algorithm
Andrea DeMarco, Stephen J. Cox
#13 Deep Belief Networks for Automatic Music Genre Classification
Xiaohong Yang, Qingcai Chen, Shusen Zhou, Xiaolong Wang
#14 Image Representation of the Subband Power Distribution for Robust Sound Classification
Jonathan William Dennis, Huy Dat Tran, Haizhou Li
#15 Acoustic and Visual Cues of Turn-Taking Dynamics in Dyadic Interactions
Bo Xiao, Viktor Rozgic, Athanasios Katsamanis, Brian Baucom, Panayiotis Georgiou, Shrikanth Narayanan

Tue-Ses1-O2:
Phonology and Phonetics

Time: Tuesday 10:00
Place: Leonardo - Pala Affari - Ground Floor
Type: Oral
Chair: Mark Hasegawa-Johnson

10:00 Laryngealization and Breathiness in Persian
Vahid Sadeghi
10:20 Age-dependent differences in the neutralization of the intervocalic voicing contrast: Evidence from an apparent-time study on East Franconian
Viola Müller, Jonathan Harrington, Felicitas Kleber, Ulrich Reubold
10:40 Comparing syllable frequencies in corpora of written and spoken language
Barbara Samlowski, Bernd Möbius, Petra Wagner
11:00 Sylli: Automatic Phonological Syllabification for Italian
Iacoponi Luca, Savy Renata
11:20 A preliminary study on the production of signs in Brazilian Sign Language when one of the manual articulators is unavailable
André Nogueira Xavier, Plinio Almeida Barbosa
11:40 Electroglottograph and Acoustic Cues for Phonation Contrasts in Taiwan Min Falling Tones
Ho-hsien Pan, Mao-hsu Chen, Shao-ren Lyu

Tue-Ses1-O4:
Robust Speech Recognition III

Time: Tuesday 10:00
Place: Michelangelo - Pala Affari - 2nd Floor
Type: Oral
Chair: Richard Stern

10:00 Sinusoidal Approach for the Single-Channel Speech Separation and Recognition Challenge
Pejman Mowlaee, Rahim Saeidi, Zheng-Hua Tan, Mads Græsbøll Christensen, Tomi Kinnunen, Søren Holdt Jensen, Pasi Fr¨anti
10:20 Semi-supervised Single-Channel Speech-Music Separation for Automatic Speech Recognition
cemil demir, murat saraçlar, ali taylan cemgil
10:40 A Level-dependent Auditory Filter-bank for Speech Recognition in Reverberant Environments
HariKrishna Maganti, Marco Matassoni
11:00 A Multichannel Feature-Based Processing for Robust Speech Recognition
Mehrez Souden, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani
11:20 Feature Normalization Using Structured Full Transforms for Robust Speech Recognition
Xiong Xiao, Jinyu Li, Eng Siong Chng, Haizhou Li
11:40 A Robust Estimation Method of Noise Mixture Model for Noise Suppression
Masakiyo Fujimoto, Shinji Watanabe, Tomohiro Nakatani

Tue-Ses1-O5:
Spoken Language Understanding

Time: Tuesday 10:00
Place: Raffaello - Pala Affari - 3rd Floor
Type: Oral
Chair: Ruhi Sarikaya

10:00 Multi-Task Learning for Spoken Language Understanding with Shared Slots
Xiao Li, Ye-Yi Wang, Gokhan Tur
10:20 Learning Weighted Entity Lists from Web Click Logs for Spoken Language Understanding
Dustin Hillard, Asli Celikyilmaz, Dilek Hakkani-Tur, Gokhan Tur
10:40 Bootstrapping Domain Detection Using Query Click Logs for New Domains
Dilek Hakkani-Tür, Gokhan Tur, Larry Heck, Elizabeth Shriberg
11:00 Multi-Domain Spoken Language Understanding with Approximate Inference
Asli Celikyilmaz, Dilek Hakkani-Tur, Gokhan Tur
11:20 Speech Indexing Using Semantic Context Inference
Chien-Lin Huang, Bin Ma, Haizhou Li, Chung-Hsien Wu
11:40 Automatically Optimizing Utterance Classification Performance without Human in the Loop
Yun-Cheng Ju, Jasha Droppo

Tue-Ses1-P1:
Human Speech and Sound Perception I

Time: Tuesday 10:00
Place: Valfonda 1 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Denis Burnham

#1 Parallels in infants’ attention to speech articulation and to physical changes in speech-unrelated objects
Eeva Klintfors, Ellen Marklund, Francisco Lacerda
#2 Speech events are recoverable from unlabeled articulatory data: Using an unsupervised clustering approach on data obtained from Electromagnetic Midsaggital Articulography (EMA)
Daniel Duran, Jagoda Bruni, Grzegorz Dogil, Hinrich Schütze
#3 Children’s recognition of their own voice: influence of phonological impairment
Sofia Strömbergsson
#4 Evaluation of Bone-conducted Ultrasonic Hearing-aid Regarding Transmission of Speaker Discrimination Information
Takayuki Kagomiya, Seiji Nakagawa
#5 Impact of Different Feedback Mechanisms in EMG-based Speech Recognition
Christian Herff, Matthias Janke, Michael Wand, Tanja Schultz
#6 Phonotactic constraints and the segmentation of Cantonese speech
Michael C. W. Yip
#7 Reaction time and decision difficulty in the perception of intonation
Katrin Schneider, Grzegorz Dogil, Bernd Möbius
#8 Processing of stress related acoustic cues as indexed by ERPs
Ferenc Honbolygó, Valéria Csépe
#9 On the relationship between perceived accentedness, acoustic similarity, and processing difficulty in foreign-accented speech
Marijt J. Witteman, Andrea Weber, James M. McQueen
#10 Perception Boundary between Single and Geminate Stops in 3- and 4-mora Japanese Words
Shigeaki Amano, Yukari Hirata
#11 Correlation Analysis of Acoustic Features with Perceptual Voice Quality Similarity for Similar Speaker Selection
Yusuke Ijima, Mitsuaki Isogai, Hideyuki Mizuno

Tue-Ses1-P2:
Multilingual and Multimodal Approaches to Spoken Language

Time: Tuesday 10:00
Place: Valfonda 2 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Michael Johnston

#1 Can Audio-Visual Speech Recognition outperform Acoustically Enhanced Speech Recognition in Automotive Environment?
Navarathna Rajitha, Kleinschmidt Tristan, Dean David, Sridharan Sridha, Lucey Patrick
#2 A Multimodal Approach to Dictation of Handwritten Historical Documents
Vicent Alabau, Verónica Romero, Antonio-L. Lagarda, Carlos-D. Martínez-Hinarejos
#3 Weight Optimization for Bimodal Unit-Selection Talking Head Synthesis
Asterios Toutios, Utpala Musti, Slim Ouni, Vincent Colotte
#4 Modality Selection and Perceived Mental Effort in a mobile Application
Stefan Schaffer, Benjamin Jöckel, Ina Wechsung, Robert Schleicher, Sebastian Möller
#5 A cross-lingual spoken content search system
Jitendra Ajmera, Ashish Verma
#6 NeMo: a Platform for Multilingual News Monitoring
Fabio Brugnara, Daniele Falavigna, Marcello Federico, Christian Girardi, Diego Giuliani, Roberto Gretter
#7 Unsupervised Learning of Acoustic Unit Descriptors for Audio Content Representation and Classification
Sourish Chaudhuri, Mark Harvilla, Bhiksha Raj
#8 Conditioned Hidden Markov Model Fusion for Multimodal Classification
Michael Glodek, Stefan Scherer, Friedhelm Schwenker
#9 Distant Speech Recognition in a Smart Home: Comparison of Several Multisource ASRs in Realistic Conditions
Benjamin Lecouteux, Michel Vacher, François Portet
#10 A Robust Approach to Mining Repeated Sequence in Audio Stream
Jiansong Chen, Lei Zhu, Bailan Feng, Peng Ding, Bo Xu

Tue-Ses1-P3:
ASR - New Paradigms and Other Topics

Time: Tuesday 10:00
Place: Faenza 1 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Chin Hui Lee

#1 Accelerated Parallelizable Neural Network Learning Algorithm for Speech Recognition
Dong Yu, Li Deng
#2 Deep Convex Network: A Scalable Architecture for Deep Learning
Li Deng, Dong Yu
#3 Modeling Broad Context for Tone Recognition with Conditional Random Fields
Siwei Wang, Gina-Anne Levow
#4 Improved Tonal Language Speech Recognition by Integrating Spectro-temporal Evidence and Pitch Information with Properly Chosen Tonal Acoustic Units
Shang-wen Li, Yow-bang Wang, Liang-che Sun, Lin-shan Lee
#5 Kullback-Leibler divergence-based ASR training data selection
Evandro Gouvea, Marelie Davel
#6 Articulatory Feature Classification Using Nearest Neighbors
Arild Brandrud Næss, Karen Livescu, Rohit Prabhavalkar
#7 Continuous episodic memory based speech recognition using articulatory dynamics
Sébastien Demange, Slim Ouni
#8 Graphone Model Interpolation and Arabic Pronunciation Generation
T. Li, P. C. Woodland, F. Diehl, M. J. F. Gales
#9 Grapheme-to-Phoneme Conversion using Conditional Random Fields
Irina Illina, Dominique Fohr, Denis Jouvet
#10 Bilingual Acoustic Model Adaptation by Unit Merging on Different Levels and Cross-level Integration
Ching-Feng Yeh, Chao-Yu Huang, Lin-Shan Lee
#11 A qualitative evaluation of phoneme-to-phoneme technology
Marijn Schraagen, Gerrit Bloothooft
#12 Cheap Bootstrap of Multi-Lingual Hidden Markov Models
Daniele Falavigna, Roberto Gretter
#13 Adaptive Stream Fusion in Multistream Recognition of Speech
Nima Mesgarani, Samuel Thomas, Hynek Hermansky
#14 Unsupervised Audio Patterns Discovery using HMM-based Self-Organized Units
Man-hung Siu, Herbert Gish, Steve Lowe, Arthur Chan
#15 NEAREST NEIGHBORS WITH LEARNED DISTANCES FOR PHONETIC FRAME CLASSIFICATION
John Labiak, Karen Livescu

Tue-Ses1-P4 :
Speaker Recognition - Modeling, Automatic Procedures, Analysis III

Time: Tuesday 10:00
Place: Faenza 2 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Daniel Garcia-Romero

#1 i-vector Based Speaker Recognition on Short Utterances
Ahilan Kanagasundaram, Robbie Vogt, David Dean, Sridha Sridharan, Michael Mason
#2 Study of Overlapped Speech Detection for NIST SRE Summed Channel Speaker Recognition
Hanwu Sun, Bin Ma
#3 Super-Dirichlet Mixture Models using Differential Line Spectral Frequences for Text-Independent Speaker Identification
Zhanyu Ma, Arne Leijon
#4 Comparison of Voice Activity Detectors for Interview Speech in NIST Speaker Recognition Evaluation
Hon-Bill Yu, Man-Wai Mak
#5 Eigen-Voice Based Anchor Modeling System for Speaker Identification using MLLR Super-Vector
Achintya Kumar Sarkar, S. Umesh
#6 Automatic Detection of Speaker Attributes Based on Utterance Text
Wen Wang, Andreas Kathol, Harry Bratt
#7 Comparison of Speaker Recognition Approaches for Real Applications
Sandro Cumani, Pier Domenico Batzu, Daniele Colibro, Claudio Vair, Pietro Laface, Vasileios Vasilakakis
#8 Modeling Speaker Personality using Voice
Tim Polzehl, Sebastian Möller, Florian Metze
#9 Structural Joint Factor Analysis for Speaker Recognition
Marc Ferras, Koichi Shinoda, Sadaoki Furui
#10 Acoustic Forest for SMAP-based Speaker Verification
Sangeeta Biswas, Marc Ferras, Koichi Shinoda, Sadaoki Furui
#11 Mixture of Auto-Associative Neural Networks for Speaker Verification
Sivaram Garimella, Samuel Thomas, Hynek Hermansky

Tue-Ses2-O1:
Dialect and Accent Identification

Time: Tuesday 13:30
Place: Auditorium - Pala Congressi
Type: Oral
Chair: David Martínez

13:30 In search of cues discriminating West-African accents in French
Philippe Boula de Mareüil, Jean-Luc Rouas, Manuela Yapomo
13:50 Computer and Human Recognition of Regional Accents of British English
Abualsoud Hanani, Martin J. Russell, Michael J. Carey
14:10 Target-aware Lattice Rescoring for Dialect Recognition
Rong Tong, Bin Ma, Haizhou Li, Eng Siong Chng
14:30 Effective Arabic Dialect Classification Using Diverse Phonotactic Models
Murat Akbacak, Dimitra Vergyri, Andreas Stolcke, Andreas Stolcke, Nicolas Scheffer, Arindam Mandal
14:50 Characterizing Deletion Transformations across Dialects using a Sophisticated Tying Mechanism
Nancy Chen, Wade Shen, Joe Campbell
15:10 Dialect and Accent Recognition using Phonetic-Segmentation Supervectors
Fadi Biadsy, Julia Hirschberg, Daniel Ellis

Tue-Ses2-O3:
ASR - Acoustic Models III

Time: Tuesday 13:30
Place: Brunelleschi (Green Room) - Pala Congressi - 2nd Floor
Type: Oral
Chair: Ralf Schlueter

13:30 Generalized Baum-Welch Algorithm and Its Implication to a New Extended Baum-Welch Algorithm
Roger Hsiao, Tanja Schultz
13:50 Word Boundary Modelling and Full Covariance Gaussians for Arabic Speech-to-Text Systems
Frank Diehl, Mark Gales, Andrew Liu, Marcus Tomalin, Phil Woodland
14:10 A Fully Automated Derivation of State-based Eigentriphones for Triphone Modeling with No Tied States using Regularization
Tom Ko, Brian Mak
14:30 Reducing Computational Complexities of Exemplar-Based Sparse Representations With Applications to Large Vocabulary Speech Recognition
Tara Sainath, Bhuvana Ramabhadran, David Nahamoo, Dimitri Kanevsky
14:50 An i-Vector based Approach to Training Data Clustering for Improved Speech Recognition
Yu Zhang, Jian Xu, Zhi-Jie Yan, Qiang Huo
15:10 Rapid Training of Acoustic Models using Graphics Processing Units
Senaka Buthpitiya, Ian Lane, Jike Chong

Tue-Ses2-S1:
Show & Tell Demonstration - Mobility and Web-services

Time: Tuesday 13:30
Place: Donatello (Room Onice) - Pala Congressi - Ground Floor
Type: Poster
Chair: Mazin Gilbert

#1 Making an automatic speech recognition service freely available on the web
Stuart Nicholas Wrigley, Thomas Hain
#2 AT&T VoiceBuilder: A Cloud-based Text-To-Speech Voice Builder Tool
Yeon-Jun Kim, Thomas Okken, Alistair Conkie, Giuseppe Di Fabbrizio
#3 Extending Audio Notetaker to Browse WebASR Transcriptions
Roger Tucker, Dan Fry, Vincent Wan, Stuart Wrigley, Thomas Hain
#4 A Web-Based Tool for Developing Multilingual Pronunciation Lexicons
Samantha Ainsley, Linne Ha, Martin Jansche, Ara Kim, Masayuki Nanzawa
#5 Speak4it and the Multimodal Semantic Interpretation System
Michael Johnston, Patrick Ehlen
#6 TSAB -- Web Interface for Transcribed Speech Collections
Tanel Alumäe, Ahti Kitsik
#7 Visual Voice Mail to Text on the iPhone/iPad
Andrej Ljolje, Vincent Goffin, Diamantino Caseiro, Taniya Mishra, Mazin Gilbert
#8 Percy - an HTML5 framework for media rich web experiments on mobile devices
Christoph Draxler
#9 The KLAIR toolkit for recording interactive dialogues with a virtual infant
Mark Huckvale
#10 Real-time Prototype for Integration of Blind Source Extraction and Robust Automatic Speech Recognition
Francesco Nesta, Marco Matassoni, Hari Krishna Maganti

Tue-Ses2-O2:
First Language Acquisition

Time: Tuesday 13:30
Place: Leonardo - Pala Affari - Ground Floor
Type: Oral
Chair: cinzia avesani

13:30 The Multi Timescale Phoneme Acquisition Model of the Self-Organizing Based on the Dynamic Features
Kouki Mizawa, Hideaki Miura, Hideaki Kikuchi, Reiko Mazuka
13:50 The time-course of talker-specificity effects for newly-learned pseudowords: Evidence for a hybrid model of lexical representation
Helen Brown, M. Gareth Gaskell
14:10 A parametric approach to intonation acquisition research: Validation on child-directed speech data
Britta Lintfert, Antje Schweitzer, Bernd Möbius
14:30 Modelling Novelty Preference in Word Learning
Maarten Versteegh, Louis ten Bosch, Lou Boves
14:50 Using Imitation to learn Infant-Adult Acoustic Mappings
G Ananthakrishnan, Giampiero Salvi
15:10 Thresholding word activations for response scoring - Modelling psycholinguistic data
Christina Bergmann, Louis ten Bosch, Lou Boves

Tue-Ses2-O4:
Spoken Dialogue Systems I

Time: Tuesday 13:30
Place: Michelangelo - Pala Affari - 2nd Floor
Type: Oral
Chair: Olivier Pietquin

13:30 User Study of Spoken Decision Support System
Teruhisa Misu, Kiyonori Ohtake, Chiori Hori, Hisashi Kawai, Satoshi Nakamura
13:50 Efficient Probabilistic Tracking of User Goal and Dialog History for Spoken Dialog Systems
Antoine Raux, Yi Ma
14:10 Tackling a Shilly-Shally Classifier for Predicting Task Success in Spoken Dialogue Interaction
Alexander Schmitt, Alexander Zgorzelski, Wolfgang Minker
14:30 Evaluation of Listening-oriented Dialogue Control Rules based on the Analysis of HMMs
Toyomi Meguro, Ryuichiro Higashinaka, Yasuhiro Minami, Kohji Dohsaka
14:50 Large-Scale Experiments on Data-Driven Design of Commercial Spoken Dialog Systems
David Suendermann, Jackson Liscombe, Jonathan Bloom, Grace Li, Roberto Pieraccini
15:10 Comparing system-driven and free dialogue in in-vehicle interaction
Fredrik Kronlid, Jessica Villing, Alexander Berman, Staffan Larsson

Tue-Ses2-O5:
Spoken Language Resources, Evaluation and Standardization II

Time: Tuesday 13:30
Place: Raffaello - Pala Affari - 3rd Floor
Type: Oral
Chair: Paolo Baggia

13:30 Rapid Evaluation of Speech Representations for Spoken Term Discovery
Michael Carlin, Samuel Thomas, Aren Jansen, Hynek Hermansky
13:50 Phonemic Similarity Metrics to Compare Pronunciation Methods
Ben Hixon, Eric Schneider, Susan L. Epstein
14:10 Investigating the effect of number of interlocutors on the quality of experience for multi-party audio conferencing
Janto Skowronek, Alexander Raake
14:30 On Development of Consistently Punctuated Speech Corpora
Jachym Kolar, Lori Lamel
14:50 A Multimodal Real-Time MRI Articulatory Corpus for Speech Research
Shrikanth Narayanan, Erik Bresch, Prasanta Ghosh, Louis Goldstein, Athanasios Katsamanis, Yoon Kim, Adam Lammert, Michael Proctor, Vikram Ramanarayanan, Yinghua Zhu
15:10 Building an audio-visual corpus of Australian English: large corpus collection with an economical portable and replicable Black Box
Denis Burnham, Dominique Estival, Steven Fazio, Felicity Cox, Robert Dale, Jette Viethen, Steve Cassidy, Julien Epps, Roberto Togneri, Yuko Kinoshita, Roland Göcke, Joanne Arciuli, Marc Onslow, Trent Lewis, Andy Butcher, John Hajek, Michael Wagner

Tue-Ses2-S1-O:
Spoken Language Processing of Human-Human Conversations I

Time: Tuesday 13:30
Place: Caravaggio (Adua 1) - Pala Affari - 1st Floor
Type: Oral
Chair: Dilek Hakkani-Tur

13:30 Language-Independent Socio-Emotional Role Recognition in the AMI Meetings Corpus
Fabio Valente, Alessandro Vinciarelli
13:50 Measuring acoustic-prosodic entrainment with respect to multiple levels and dimensions
Rivka Levitan, Julia Hirschberg
14:10 Automatic Call Quality Monitoring Using Cost-Sensitive Classification
Youngja Park

Tue-Ses2-P1:
Human Speech and Sound Perception II

Time: Tuesday 13:30
Place: Valfonda 1 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Holger Mitterer.

#1 Pointing Gestures do not Influence the Perception of Lexical Stress
Alexandra Jesse, Holger Mitterer
#2 Relationships between Phonetic Features and Speech Perception
Ian Cushing, Francis Li, Ken Worrall, Jackson Tim
#3 The representation of speech in a nonlinear auditory model: time-domain analysis of simulated auditory-nerve firing patterns
Guy Brown, Tim Jurgens, Ray Meddis, Matthew Robertson, Nicholas Clark
#4 An Automatic Voice Pleasantness Classification System based on Prosodic and Acoustic Patterns of Voice Preference
Luis Pinto-Coelho, Daniela Braga, Miguel Sales-Dias, Carmen Garcia-Mateo
#5 Contributions of F1 and F2 (F2’) to the perception of plosive consonants
René Carré, Pierre Divenyi, Willy Serniclaes, Emmanuel Ferragne, Egidio Marsico, Viet-Son Nguyen
#6 Auditory speech processing is affected by visual speech in the periphery
Jeesun Kim, Chris Davis
#7 Visual Speech Speeds Up Auditory Identification Responses
Tim Paris, Jeesun Kim, Davis Chris
#8 Agglomerative Hierarchical Clustering of Emotions in Speech Based on Subjective Relative Similarity
Ryoichi Takashima, Tohru Nagano, Ryuki Tachibana, Masafumi Nishimura
#9 Optimal Syllabic Rates and Processing Units in Perceiving Mandarin Spoken Sentences
Guangting Mai, Gang Peng
#10 Cross-Lingual Speaker Discrimination Using Natural and Synthetic Speech
Mirjam Wester, Hui Liang

Tue-Ses2-P2:
Speech Audio Analysis

Time: Tuesday 13:30
Place: Valfonda 2 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Toshiaki Fukada

#1 Robust Audio Fingerprinting Based on Local Spectral Luminance Maxima Scheme
Yong-Zhe Shi, Wei-Qiang Zhang, jia Liu
#2 Entropy Driven Inference of Stochastic Grammars
Unto Kalervo Laine
#3 An Efficient Pre-processing Scheme to Improve the Sound Source Localization System in Noisy Environment
Sheng-Chieh Lee, Bo-Wei Chen, Jhing-Fa Wang, Chung-Hsien Wu, Min-Jian Liao
#4 A study on auditory feature spaces for speech-driven lip animation
Guylaine Le-Jan, Yannick Benezeth, Guillaume Gravier, Frédéric Bimbot
#5 Phase-only Speech Reconstruction Using Very Short Frames
Erfan Loweimi, Seyed Mohammad Ahadi, Hamid Sheikhzadeh
#6 Frequency-Warped and Stabilized Time-Varying Cepstral Coefficients
Trond Skogstad, Torbjørn Svendsen
#7 Using Human Perception for Automatic Accent Assessment
Freddy William, Abhijeet Sangwan, John H.L. Hansen
#8 A study of the effectiveness of articulatory strokes for phonemic recognition
Carlos Molina, Sungbok Lee, Shrikanth Narayanan, Néstor Becerra Yoma
#9 Auditory Filterbank Improves Voice Morphing
Erika Okamoto, Toshio Irino, Ryuichi Nisimura, Hideki Kawahara
#10 Monaural Sound Localization
Anna Katharina Fuchs, Christian Feldbauer, Michael Stark

Tue-Ses2-P3:
Speech Coding

Time: Tuesday 13:30
Place: Faenza 1 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Alan McCree

#1 Dual-mode AVQ Coding Based on Spectral Masking and Sparseness Detection for ITU-T G.711.1/G.722 Super-wideband Extensions
Masahiro Fukui, Shigeaki Sasaki, Yusuke Hiwasaki, Sachiko Kurihara, Yoichi Haneda
#2 Phone Impact Based Speech Transmission Technique for Reliable Speech Recognition in Poor Wireless Network Conditions
Azar Taufique, Kumaran Vijayasankar, Wooil Kim, John H.L. Hansen, Marco Tacca, Andrea Fumagalli
#3 Automatic Speech Codec Identification with Applications to Tampering Detection of Speech Recordings
Jingting Zhou, Daniel Garcia-Romero, Carol Espy-Wilson
#4 A hybrid quasi-harmonic/CELP wideband speech coding scheme for unit selection TTS synthesis
Chang-Heon Lee, Olivier Rosec, Yannis Stylianou
#5 Voice Quality Characterization of IETF Opus Codec
Anssi Rämö, Henri Toukomaa
#6 Leja ordering LSFs for accurate estimation of predictor coefficients
Christian Fischer Pedersen
#7 Improved Quality for Conversational VoIP using Path Diversity
Qipeng Gong, Peter Kabal
#8 Tree Encoding for the ITU-T G.711.1 Speech Coder
Abdul Hannan Khan, Peter Kabal
#9 Parallel and Hierarchical Decision Making for Sparse Coding in Speech Recognition
Dong Wang, Ravichander Vipperla, Nicholas Evans
#10 A New Model-based Mandarin-speech Coding System
Chen-Yu Chiang, Jyh-Her Yang, Ming-Chieh Liu, Yih-Ru Wang, Yuan-Fu Liao, Sin-Horn Chen

Tue-Ses2-P4:
Robustness and Adaptation for ASR

Time: Tuesday 13:30
Place: Faenza 2 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Vivek Kumar

#1 Using Unsupervised Feature-Based Speaker Adaptation for Improved Transcription of Spoken Archives
Petr Cerva, Karel Palecek, Jan Silovsky, Jan Nouza
#2 Online Speaker Adaptation with Pre-computed FMLLR Transformations
Volker Fischer, Siegfried Kunzmann
#3 Instantaneous Speaker Adaptation through Selection and Combination of fMLLR Transformation Matrices
Diego Giuliani, Fabio Brugnara
#4 Joint Bilinear Transformation Space Based Maximum a Posteriori Linear Regression Adaptation using Prior with Variance Function
Hwa Jeon Song, Yunkeun Lee, Hyung Soon Kim
#5 A Study on Combining VTLN and SAT to Improve the Performance of Automatic Speech Recognition
Rama Sanand Doddipatla, Mikko Kurimo
#6 Incorporating Regional Information to Enhance MAP-based Stochastic Feature Compensation for Robust Speech Recognition
Yu Tsao, Paul R. Dixon, Chiori Hori, Hisashi Kawai
#7 A Study on the Effect of Pitch on LPCC and PLPC Features for Children\'s ASR in comparison to MFCC
Shweta Ghai, Rohit Sinha
#8 About Handling Boundary Uncertainty in a Speaking Rate Dependent Modeling Approach
Denis Jouvet, Dominique Fohr, Irina Illina
#9 An Active Learning Approach to Task Adaptation
Ji Wu, Zhiyang He, Ping Lv
#10 Efficient Speaker and Noise Normalization for Robust Speech Recognition
Vikas Joshi, Raghavendra Bilgi, Umesh S, Carmen Benitez, Luz García Martínez
#11 How Realistic is Artificially Added Noise?
Thomas Winkler

Tue-Ses2-S1-P:
Spoken Language Processing of Human-Human Conversations II

Time: Tuesday 14:30
Place: Caravaggio (Adua 1) - Pala Affari - 1st Floor
Type: Poster
Chair: Dilek Hakkani-Tur

#1 Learning Influences from Word Use in Polylogue
Tomoharu Iwata, Shinji Watanabe
#2 Identifying Agreement/Disagreement in Conversational Speech: A Cross-lingual Study
Wen Wang, Kristin Precoda, Colleen Richey, Geoffrey Raymond
#3 A Dual Channel Coupled Decoder for Fillers and Feedback
Daniel Neiberg, Joakim Gustafson
#4 An Analysis of PCA-based Vocal Entrainment Measures in Married Couples\' Affective Spoken Interactions
Chi-Chun Lee, Athanasios Katsamanis, Matthew P. Black, Brian R. Baucom, Panayiotis G. Georgiou, Shrikanth S. Narayanan

Tue-Ses3-O1:
Language Identification

Time: Tuesday 16:00
Place: Auditorium - Pala Congressi
Type: Oral
Chair: Philippe Boula de Mareüil

16:00 Data-driven UBM Generation via Tied Gaussians for GMM-Supervector Based Accent Identification
Rong Zheng, Ce Zhang, Bo Xu
16:20 I3A Language Recognition System for Albayzin 2010 LRE
David Martínez, Jesús Villalba, Antonio Miguel, Alfonso Ortega, Eduardo Lleida
16:40 Dimensionality Reduction for Using High-Order n-grams in SVM-Based Phonotactic Language Recognition
Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-Fuentes, German Bordel
17:00 Language Recognition via Ivectors and Dimensionality Reduction
Najim Dehak, Pedro A. Torres Carrasquillo, Douglas Reynolds, Reda Dehak
17:20 Language Recognition in iVectors Space
David Martínez, Oldrich Plchot, Lukas Burget, Ondrej Glembek, Pavel Matejka

Tue-Ses3-O3:
ASR - Search, Keyword Spotting and Confidence Measures II

Time: Tuesday 16:00
Place: Brunelleschi (Green Room) - Pala Congressi - 2nd Floor
Type: Oral
Chair: Geoffrey Zweig

16:00 A Template Based Voice Trigger System Using Bhattacharyya Edit Distance
Evelyn Kurniawati, Samsudin Ng, Karthik Muralidhar, Sapna George
16:20 Acoustic Look-Ahead for More Efficient Decoding in LVCSR
David Nolden, Ralf Schlüter, Hermann Ney
16:40 A new Epsilon Filter for Efficient Composition of Weighted Finite-State Transducers
Frank Duckhorn, Matthias Wolff, Rüdiger Hoffmann
17:00 A Bottom-Up Stepwise Knowledge-Integration Approach to Large Vocabulary Continuous Speech Recognition Using Weighted Finite State Machines
Sabato Marco Siniscalchi, Torbjorn Svendsen, Chin-Hui Lee
17:20 Combining Information Sources for Confidence Estimation with CRF Models
Matthew Stephen Seigel, Philip Woodland
17:40 Evaluation of Fast Spoken Term Detection Using a Suffix Array
Kouichi Katsurada, Shinta Sawada, Shigeki Teshima, Yurie Iribe, Tsuneo Nitta

Tue-Ses3-O2:
Second Language Acquisition, Development and Learning II

Time: Tuesday 16:00
Place: Leonardo - Pala Affari - Ground Floor
Type: Oral
Chair: Robert Fox

16:00 On Mispronunciation Lexicon Generation using Joint-sequence Multigrams in Computer-Aided Pronunciation Training
Xiaojun Qian, Helen Meng, Frank Soong
16:20 Validating a second language perception model for classroom context. A longitudinal study within the Perceptual Assimilation Model
Bianca Sisinni, Mirko Grimaldi
16:40 The role of variability in non-native perceptual learning of a Japanese geminate-singleton fricative contrast
Makiko Sadakata, James M. McQueen
17:00 Fluency Changes with General Progress in L2 Proficiency
Jared Bernstein, Jian Cheng, Masanori Suzuki
17:20 Tongue Gestures Awareness and Pronunciation Training
Slim Ouni
17:40 Impact of speaker variability on speech perception in non-native listeners
Wim A. van Dommelen, Valerie Hazan

Tue-Ses3-O4:
SLP for Information Extraction and Retrieval I

Time: Tuesday 16:00
Place: Michelangelo - Pala Affari - 2nd Floor
Type: Oral
Chair: Haizhou Li

16:00 LATENT TOPIC MODELING FOR AUDIO CORPUS SUMMARIZATION
Timothy J. Hazen
16:20 Investigation of Spontaneous Speech Characterization Applied to Speaker Role Recognition
Richard Dufour, Yannick Estève, Paul Deléglise
16:40 Zero-resource audio-only spoken term detection based on a combination of template matching techniques
Armando Muscariello, Guillaume Gravier, Frédéric Bimbot
17:00 Automatic Learning in Content Indexing Service using Phonetic Alignment
Yeon-Jun Kim, Dave C. Gibbon
17:20 Leveraging Relevance Cues for Improved Spoken Document Retrieval
Pei-Ning Chen, Kuan-Yu Chen, Berlin Chen
17:40 Spoken Lecture Summarization by Random Walk over a Graph Constructed with Automatically Extracted Key Terms
Yun-Nung Chen, Yu Huang, Ching-Feng Yeh, Lin-Shan Lee

Tue-Ses3-S1-O:
Speech and Audio Processing for Human-Robot Interaction I

Time: Tuesday 16:00
Place: Caravaggio (Adua 1) - Pala Affari - 1st Floor
Type: Oral
Chair: Laurence Devillers 

16:00 Using Prominence Detection to Generate Acoustic Feedback in Tutoring Situations
Lars Schillingmann, Petra Wagner, Christian Munier, Britta Wrede, Katharina Rohlfing
16:20 Bayesian Extension of MUSIC for Sound Source Localization and Tracking
Takuma Otsuka, Kazuhiro Nakadai, Tetsuya Ogata, Hiroshi G. Okuno
16:40 Speech-based Non-prototypical Affect Recognition for Child-Robot Interaction in Reverberated Environments
Martin Woellmer, Felix Weninger, Bjoern Schuller

Tue-Ses3-P1:
Voice Activity Detection

Time: Tuesday 16:00
Place: Valfonda 1 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Abeer Alwan

#1 Voice activity detection in MTF-based power envelope restoration
Masashi Unoki, Xugang Lu, Rico Petrick, Shota Morita, Masato Akagi, Ruediger Hoffmann
#2 Using Spectral Fluctuation of Speech in multi-feature HMM-based voice activity detection
Miquel Espi, Shigeki Miyabe, Takuya Nishimoto, Nobutaka Ono, Shigeki Sagayama
#3 Linear Dynamic Models for Voice Activity Detection
Kannu Mehta, Chau Khoa Pham, Eng Siong Chng
#4 Detection of Shouted Speech in the Presence of Ambient Noise
Jouni Pohjalainen, Tuomo Raitio, Paavo Alku
#5 Breath-detection-based Telephony Speech Phrasing
Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura
#6 Multi-channel voice activity detection based on conic constraints
Gibak Kim
#7 Multi-Sensor Voice Activity Detection based on Multiple Observation Hypothesis Testing
Theodoros Petsatodis, Fotios Talantzis, Christos Boukis, Zheng-Hua Tan, Ramjee Prasad
#8 Online Speech Activity Detection in Broadcast News
Chao Gao, Guruprasad Saikumar, Saurabh Khanwalkar, Avi Herscovici, Anoop Kumar, Amit Srivastava, Premkumar Natarajan
#9 A Real-Time Speech Command Detector for a Smart Control Room
Daniel Reich, Daniel Reich, Felix Putze, Dominic Heger, Joris Ijsselmuiden, Rainer Stiefelhagen, Tanja Schultz
#10 Robust Voice Activity Detector for Real World Applications Using Harmonicity and Modulation frequency
Ekapol Chuangsuwanich, James Glass
#11 On Noise Robust Voice Activity detection
Tomas Dekens, Werner Verhelst
#12 Adaptive regularization framework for robust voice activity detection
Xugang Lu, Masashi Unoki, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura

Tue-Ses3-P2:
Human Speech Production I

Time: Tuesday 16:00
Place: Valfonda 2 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Francesco Cutugno

#1 On the use of extended context for HMM-based spontaneous conversational speech synthesis
Tomoki Koriyama, Takashi Nose, Takao Kobayashi
#2 Predicting Tongue Positions from Acoustics and Facial Features
Asterios Toutios, Slim Ouni
#3 Assessing acoustic reduction: Exploiting local structure in speech
Louis ten Bosch, Annika Hämäläinen, Mirjam Ernestus
#4 THE “FORTIS-LENIS” DISTINCTION IN BULGARIAN AND GERMAN
Bistra Andreeva, Magdalena Wolska
#5 Acoustic Correlates of Glottal Gaps
Gang Chen, Jody Kreiman, Yen-Liang Shue, Abeer Alwan
#6 Using a Genetic Algorithm to Estimate Parameters of a Coarticulation Model
Brian Bush, John-Paul Hosom, Alexander Kain, Akiko Amano-Kusumoto
#7 Synthesis of breathy, normal, and pressed phonation using a two-mass model with a triangular glottis
Peter Birkholz, Bernd J. Kröger, Christiane Neuschaefer-Rube
#8 Analysis of inter-articulator correlation in acoustic-to-articulatory inversion using generalized smoothness criterion
Prasanta Ghosh, Shrikanth Narayanan
#9 Frequency-domain representation of source-filter coupling and its effect in the production of voice
Tokihiko Kaburagi
#10 Method for speech inversion with large scale statistical evaluation
Heikki Rasilo, Unto K. Laine, Okko Räsänen, Toomas Altosaar
#11 Italian in the no-man\'s land between stress-timing and syllable-timing? Speakers are more stress-timed than listeners
Bettina Braun, Sabine Geiselmann
#12 The Lombard Effect in Spontaneous Dialog Speech
Laura Folk, Florian Schiel

Tue-Ses3-P3:
Speaker Recognition - Analysis and Statistics III

Time: Tuesday 16:00
Place: Faenza 1 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Pierre-Michel Bousquet

#1 Variational Bayesian Model Selection for GMM-Speaker Verification using Universal Background Model
Timur Pekhovsky, Alexandra Lokhanova
#2 To Weight or not to Weight: Source-Normalised LDA for Speaker Recognition using i-vectors
Mitchell McLaren, David van Leeuwen
#3 Maximum Entropy based Data Selection for Speaker Recognition
Chien-Lin Huang, Bin Ma
#4 Addressing the Data-Imbalance Problem in Kernel-based Speaker Verification via Utterance Partitioning and Speaker Comparison
Wei Rao, Man-Wai Mak
#5 Single-channel Head Orientation Estimation Based on Discrimination of Acoustic Transfer Function
Ryoichi Takashima, Tetsuya Takiguchi, Yasuo Ariki
#6 Maximum Likelihood i-vector Space Using PCA for Speaker Verification
Zhenchun Lei, Yingchun Yang
#7 Speaker Verification using Sparse Representations on Total Variability I-Vectors
Ming Li, Xiang Zhang, Yonghong Yan, Shrikanth Narayanan
#8 Robust Speaker Recognition in Non-Stationary Room Environments Based on Empirical Mode Decomposition
Taufiq Hasan, John Hansen
#9 Range based multi microphone array fusion for speaker activity detection in small meetings
Jani Even, Panikos Heracleous, Carlos Ishi, Norihiro Hagita
#10 Speaker verification robust to talking style variation using multiple kernel learning based on conditional entropy minimization
Tetsuji Ogawa, Hideitsu Hino, Noboru Murata, Tetsunori Kobayashi
#11 Regularized Logistic Regression Fusion for Speaker Verification
Ville Marko Hautamaki, Kong Aik Lee, Tomi Kinnunen, Bin Ma, Haizou Li
#12 A Longest Matching Segment Approach with Baysian Adaptation - Application to Noise-Robust Speaker Recognition
Ayeh Jafari, Ramji Srinivasan, Danny Crookes, Ming Ji
#13 Data Selection with Kurtosis and Nasality features for Speaker Recognition
Howard Lei, Nikki Mirghafori
#14 Use of The Harmonic Phase in Speaker Recognition
Inma Hernaez, Ibon Saratxaga, Jon Sanchez, Eva Navas, Iker Luengo

Tue-Ses3-P4:
Voice Conversion and Speech Synthesis

Time: Tuesday 16:00
Place: Faenza 2 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Alan Black

#1 Gaussian Process Experts for Voice Conversion
Nicholas Pilkington, Heiga Zen, Mark Gales
#2 Intonation Conversion From Neutral to Expressive Speech
Christophe Veaux, Xavier Rodet
#3 Speaker-Adaptive Speech Synthesis Based on Eigenvoice Conversion and Language-Dependent Prosodic Conversion in Speech-to-Speech Translation
Nobuhiko Hattori, Hisashi Kawai, Hiroshi Saruwatari, Kiyohiro Shikano
#4 Adding Glottal Source Information to Intra-lingual Voice Conversion
Javier Pérez, Antonio Bonafonte
#6 Formant-controlled HMM-based Speech Synthesis
Ming Lei, Junichi Yamagishi, Korin Richmond, Zhen-Hua Ling, Simon King, Li-Rong Dai
#7 Analysis of HMM-Based Lombard Speech Synthesis
Tuomo Raitio, Antti Suni, Martti Vainio, Paavo Alku
#8 Discrete/Continuous Modelling of Speaking Style in HMM-based Speech Synthesis: Design and Evaluation
Nicolas Obin, Pierre Lanchantin, Anne Lacheret, Xavier Rodet
#9 Factored MLLR Adaptation For Singing Voice Generation
June Sig Sung, Doo Hwa Hong, Shin Jae Kang, Nam Soo Kim
#11 Adaptation of Prosody in Speech Synthesis by Changing Command Values of the Generation Process Model of Fundamental Frequency
Keikichi Hirose, Keiko Ochi, Ryusuke Mihara, Hiroya Hashimoto, Daisuke Saito, Nobuaki Minematsu
#12 Prosody Conversion for Emotional Mandarin Speech Synthesis Using the Tone Nucleus Model
Miaomiao Wen, Miaomiao Wang, Keikichi Hirose, Nobuaki Minematsu
#13 Rapid Adaptation of Foreign-accented HMM-based Speech Synthesis
Reima Karhila, Mirjam Wester
#14 The Effects of Phoneme Errors in Speaker Adaptation for HMM Speech Synthesis
Bálint Tóth, Tibor Fegyó, Géza Németh

Tue-Ses3-S1-P:
Speech and Audio Processing for Human-Robot Interaction II

Time: Tuesday 17:00
Place: Caravaggio (Adua 1) - Pala Affari - 1st Floor
Type: Poster
Chair: Alex Rudnicky

#1 Blind Source Separation for Robot Audition using Fixed Beamforming with HRTFs
Mounira Maazaoui, Yves Grenier, Karim Abed-Meraim
#2 Audio-Visual Voice Activity Detection in Dynamically Changing Environments
Takami Yoshida, Keisuke Nakamura, Kazuhiro Nakadai
#3 Emotion detection from speech in human-robot interaction
Marie Tahon, Agnès Delaborde, Laurence Devillers
#4 WEIGHTED ORDERED CLASSES - NEAREST NEIGHBORS : A NEW FRAMEWORK FOR AUTOMATIC EMOTION RECOGNITION FROM SPEECH
Yazid Attabi, Pierre Dumouchel
#5 Prosodic Analysis of a Corpus of Tales
David Doukhan, David Doukhan, Albert Rilliard, Sophie Rosset, Martine Adda-Decker, Christophe d\'Alessandro
#6 Analysis of acoustic-prosodic features related to paralinguistic information carried by interjections in dialogue speech
Carlos T. Ishi, Hiroshi Ishiguro, Norihiro Hagita
#7 Robust intonation pattern classification in human robot interaction
Martin Heckmann, Kazuhiro Nakadai, Hirofumi Nakajima
#8 ASR for human-symbiotic robot ``EMIEW2\'\' with Mechanical Noise and Floor-Level Noise Reduction
Takashi Sumiyoshi, Masahito Togami, Yasunari Obuchi

Wed-Ses1-O1:
Speaker Diarization I

Time: Wednesday 10:00
Place: Auditorium - Pala Congressi
Type: Oral
Chair: Janez Zibert

10:00 SPEAKER DIARIZATION USING A PRIORI ACOUSTIC INFORMATION
Hagai Aronowitz
10:20 Improved Overlapped Speech Handling for Speaker Diarization
Kofi Boakye, Oriol Vinyals, Gerald Friedland
10:40 Exploiting Intra-Conversation Variability for Speaker Diarization
Stephen Shum, Najim Dehak, Ekapol Chuangsuwanich, Douglas Reynolds, Jim Glass
11:00 Speaker Clustering Based on Non-negative Matrix Factorization
Masafumi Nishida, Seiichi Yamamoto
11:20 Information Bottleneck Features for HMM/GMM Speaker Diarization of Meetings Recordings
Sree Harsha Yella, Fabio Valente
11:40 Cross Likelihood Ratio Based Speaker Clustering Using Eigenvoice Models
David Wang, Robert Vogt, Sridha Sridharan, David Dean

Wed-Ses1-O3:
ASR - New Paradigms

Time: Wednesday 10:00
Place: Brunelleschi (Green Room) - Pala Congressi - 2nd Floor
Type: Oral
Chair: John Hansen

10:00 New Methods for Template Selection and Compression in Continuous Speech Recognition
Xie Sun, Yunxin Zhao
10:20 Structured Support Vector Machines for Noise Robust Continuous Speech Recognition
Shi-Xiong Zhang, M.J.F. Gales
10:40 Continuous Digits Recognition Leveraging Invariant Structure
Masayuki Suzuki, Gakuto Kurata, Masafumi Nishimura, Nobuaki Minematsu
11:00 Convergence of Line Search A-Function methods
Dimitri Kanevsky, David Nahamoo, Tara Sainath, Bhuvana Ramabhadran
11:20 Hidden Boosted MMI and Hierarchical State Posterior Feature for Automatic Speech Recognition based on Hidden Conditional Neural Fields
Yasuhisa Fujii, Kazumasa Yamamoto, Seiichi Nakagawa
11:40 Recognition and Real Time Performances of a Lightweight Ultrasound Based Silent Speech Interface Employing a Language Model
Jun Cai, Bruce Denby, Pierre Roussel, Gerard Dreyfus, Lise Crevier-Buchman

Wed-Ses1-S3:
Speech Processing Tools

Time: Wednesday 10:00
Place: Donatello (Room Onice) - Pala Congressi - Ground Floor
Type: Poster
Chair: Christoph Draxler

#1 Speech Processing Tools - An Introduction to Interoperability
Christoph Draxler, Toomas Altosaar, Sadaoki Furui, Mark Liberman, Peter Wittenburg
#2 EasyAlign: an automatic phonetic alignment tool under Praat
Jean-Philippe Goldman
#3 MTRANS: A multi-channel, multi-tier speech annotation tool
Julián Villegas, Martin Cooke, Vincent Aubanel, Marco A. Piccolino-Boniforti
#4 The JSafran platform for semi-automatic speech processing
Christophe Cerisara, Claire Gardent
#5 The Social Signal Interpretation Framework (SSI) for Real Time Signal Processing and Recognition
Johannes Wagner, Florian Lingenfelser, Elisabeth Andre
#6 ELAN – aspects of interoperability and functionality
Han Sloetjes, Peter Wittenburg, Aarthy Somasundaram
#7 Open source voice creation toolkit for the MARY TTS Platform
Marc Schröder, Marcela Charfuelan, Sathish Pammi, Ingmar Steiner
#8 Java Visual Speech Components for Rapid Application Development of GUI based Speech Processing Applications
Stefan Steidl, Korbinian Riedhammer, Tobias Bocklet, Florian Hönig, Elmar Nöth
#9 mTalk - A Multimodal Browser for Mobile Services
Michael Johnston, Giuseppe Di Fabbrizio, Simon Urbanek
#10 Web-based automatic speech recognition service - webASR
Stuart Nicholas Wrigley, Thomas Hain
#11 A Web based Speech Transcription Workplace
Markus Klehr, Andreas Ratzka, Thomas Ross
#12 WinPitch, a multimodal tool for speech analysis of endangered languages
Philippe Martin
#13 Recording caregiver interactions for machine acquisition of spoken language using the KLAIR virtual infant
Mark Huckvale

Wed-Ses1-O2:
Prosody I

Time: Wednesday 10:00
Place: Leonardo - Pala Affari - Ground Floor
Type: Oral
Chair: Gérard Bailly

10:00 A quantitative investigation of the prosody of Verum Focus in Italian
Giuseppina Turco, Michele Gubian, Jessamyn Schertz
10:20 Effects of focus on f0 and duration in Irish (Gaelic) declaratives
Amelie Dorn, Ailbhe Ní Chasaide
10:40 The phonology and phonetics of perceived prosody: What do listeners imitate?
Jennifer Cole, Stefanie Shattuck-Hufnagel
11:00 Uncovering the effect of imitation on tonal patterns of French Accentual Phrases
Amandine Michelas, Noël Nguyen
11:20 Crossmodal prosodic and gestural contribution to the perception of contrastive focus to the perception of contrastive focus
Pilar Prieto, Cecilia Pugliesi, Joan Borràs-Comes, Ernesto Arroyo, Josep Blat
11:40 Temporal relationship between auditory and visual prosodic cues
Erin Cvejic, Jeesun Kim, Chris Davis

Wed-Ses1-O4:
Spoken Dialogue Systems II

Time: Wednesday 10:00
Place: Michelangelo - Pala Affari - 2nd Floor
Type: Oral
Chair: Steve Young

10:00 Optimizing Situated Dialogue Management in Unknown Environments
Heriberto Cuayahuitl, Nina Dethlefs
10:20 Acoustic-similarity based technique to improve concept recognition
Om D Deshmukh, Shajith Ikbal, Ashish Verma, Etienne Marcheret
10:40 Dialog Methods for Improved Alphanumeric String Capture
Doug Peters, Peter Stubley
11:00 Detecting the Status of a Predictive Incremental Speech Understanding Model for Real-Time Decision-Making in a Spoken Dialogue System
David DeVault, Kenji Sagae, David Traum
11:20 User Simulation in Dialogue Systems using Inverse Reinforcement Learning
Senthilkumar Chandramohan, Matthieu Geist, Fabrice Lefevre, Olivier Pietquin
11:40 Lossless Value Directed Compression of Complex User Goal States for Statistical Spoken Dialogue Systems
Paul A. Crook, Oliver Lemon

Wed-Ses1-S1:
Speaker State Challenge - Intoxication and Sleepiness I

Time: Wednesday 10:00
Place: Raffaello - Pala Affari - 3rd Floor
Type: Oral
Chair: Bjoern Schuller

10:00 The INTERSPEECH 2011 Speaker State Challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Florian Schiel, Jarek Krajewski
10:20 Combining Multiple Phoneme-based Classifiers with Audio Feature-based Classifier for the Detection of Alcohol Intoxication
Claude Montacié, Marie-José Caraty
10:40 Intoxication Detection using Phonetic, Phonotactic and Prosodic Cues
Fadi Biadsy, William Yang Wang, Andrew Rosenberg, Julia Hirschberg
11:00 Drink and Speak: On the automatic classification of alcohol intoxination by acoustic, prosodic and text-based features
Tobias Bocklet, Korbinian Riedhammer, Elmar Nöth
11:20 Intoxicated Speech Detection Using Hierarchical Features and Iterative Speaker Normalization
Daniel Bone, Matthew P. Black, Ming Li, Angeliki Metallinou, Sungbok Lee, Shrikanth S. Narayanan
11:40 Attention, Sobriety Checkpoint! Can Humans Determine by Means of Voice, if Someone is Drunk... and can Automatic Classifiers Compete?
Stefan Ultes, Alexander Schmitt, Wolfgang Minker
12:00 Does it Groove or Does it Stumble - Automatic Classification of Alcoholic Intoxiation Using Prosodic Features
Florian Hönig, Anton Batliner

Wed-Ses1-S2-O:
Speech Technology for Under-Resourced Languages I

Time: Wednesday 10:00
Place: Caravaggio (Adua 1) - Pala Affari - 1st Floor
Type: Oral
Chair: Alexey Karpov Chairs: Alexey Karpov, Laurent Besacier

10:00 Rapid building of an ASR system for Under-Resourced Languages based on Multilingual Unsupervised Training
Ngoc Thang Vu, Franziska Kraus Kraus, Tanja Schultz
10:20 Places and Manner of Articulation of Bangla Consonants: A EPG based study
Shyamal Kr Das Mandal, Somnath Chandra, Swaran Lata, Ashoke Kumar Datta
10:40 Efficient harvesting of Internet audio for resource-scarce ASR
Marelie Hattingh Davel, Charl van Heerden, Neil Kleynhans

Wed-Ses1-P1:
Human Speech Production II

Time: Wednesday 10:00
Place: Valfonda 1 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Francis Grenez

#2 Articulatory Reduction in Mandarin Chinese Words
Jeffrey Berry, Sunjing Ji, Ian Fasel, Diana Archangeli
#3 Morphological Variation in the Adult Vocal Tract: A Modeling Study of its Potential Acoustic Impact
Adam Lammert, Michael Proctor, Athanasios Katsamanis, Shrikanth Narayanan
#4 Analysis and automatic estimation of children\'s subglottal resonances
Steven M. Lulich, Harish Arsikere, John R. Morton, Gary K. F. Leung, Abeer Alwan, Mitchell S. Sommers
#5 Acceleration Sensor Based Estimates of Subglottal Resonances: Short vs. Long Vowels
Wolfgang Wokurek, Andreas Madsack
#6 Comparison of nasalance measurements from accelerometers and microphones and preliminary development of novel features
Nicolas Audibert, Angélique Amelot
#7 The effect of seeing the interlocutor on speech production in different noise types
Michael Fitzpatrick, Jeesun Kim, Davis Chris
#8 Conversing in the presence of a competing conversation: effects on speech production
Vincent Aubanel, Martin Cooke, Julian Villegas, Maria Luisa Garcia Lecumberri
#9 Very short utterances and timing in turn-taking
Mattias Heldner, Jens Edlund, Anna Hjalmarsson, Kornel Laskowski, Kornel Laskowski
#10 Validating rt-MRI based articulatory representations via articulatory recognition
Athanasios Katsamanis, Erik Bresch, Vikram Ramanarayanan, Shrikanth Narayanan
#11 An Electropalatographic and Acoustic Study on Anticipatory Coarticulation in V1#C2V2 Sequences in Standard Chinese
Yinghao Li, Jiangping Kong
#12 Final /t/ reduction in Dutch past-participles: the role of word predictability and morphological decomposability
Iris Hanique, Mirjam Ernestus
#13 Parametrising Degree of Articulator Movement from Dynamic MRI Data
Raeesy Zeynab, Baghai-Ravary Ladan, Coleman John

Wed-Ses1-P2:
Systems for LVCSR and rich transcription

Time: Wednesday 10:00
Place: Valfonda 2 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: Diego Giuliani

#1 Improving LVCSR System Combination Using Neural Network Language Model Cross Adaptation
Xunying Liu, Mark Gales, Phil Woodland
#2 TOWARDS HIGH PERFORMANCE LVCSR IN SPEECH-TO-SPEECH TRANSLATION SYSTEM ON SMART PHONES
Jian Xue, Xiaodong Cui, Gregg Daggett, Etienne Marcheret, Bowen Zhou
#3 Deploying Google Search by Voice in Cantonese
Yun-Hsuan Sung, Martin Jansche, Pedro Moreno
#4 An Investigation on Speech Recognition for Colloquial Arabic
Sarah Al-Shareef, Thomas Hain
#5 A multithreaded implementation of Viterbi decoding on Recursive Transition Networks
Fabio Brugnara
#6 Recurrent Neural Network based Language Modeling in Meeting Recognition
Stefan Kombrink, Tomas Mikolov, Karafiat Martin, Burget Lukas
#7 Ad-Hoc Meeting Transcription on Clusters of Mobile Devices
Michele Cossalter, Priya Sundararajan, Ian Lane
#8 ROVER Enhancement with Automatic Error Detection
Kacem Abida, Fakhri Karray
#9 Automatic Comma Insertion of Lecture Transcripts Based on Multiple Annotations
Yuya Akita, Tatsuya Kawahara

Wed-Ses1-P3:
Language, Dialect Identification and Speaker Diarization

Time: Wednesday 10:00
Place: Faenza 1 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair:  Nancy Chen

#1 Study on the Relevance Factor of Maximum a Posteriori with GMM for Language Recognition
Chang Huai You, Haizhou Li, Kong Aik Lee
#2 Improving Multiband Position Pitch Algorithm for Localization and Tracking of Multiple Concurrent Speakers by using a Frequency Selective Criterion
Tania Habib, Harald Romsdorfer
#3 On the Use of Lattices of Time-Synchronous Cross-Decoder Phone Co-occurrences in a SVM-Phonotactic Language Recognition System
Amparo Varona, Mikel Penagarikano, Luis Javier Rodriguez-Fuentes, German Bordel
#4 Speaker Clustering Based on Utterance-oriented Dirichlet Process Mixture Model
Naohiro Tawara, Shinji Watanabe, Tetsuji Ogawa, Tetsunori Kobayashi
#5 PLDA-based Clustering for Speaker Diarization of Broadcast Streams
Jan Silovsky, Jan Prazak, Petr Cerva, Jindrich Zdansky, Jan Nouza
#6 iVector Approach to Phonotactic Language Recognition
Mehdi Soufifar, Marcel Kockmann, Lukas Burget, Olda Plchot, Ondrej Glembek, Torbjørn Svendsen
#7 Discriminative Features For Language Identification
Christopher Alberti, Michiel Bacchiani
#8 Perceptual sensitivity to dialectal and generational variations in vowels
Robert Allen Fox, Ewa Jacewicz
#9 Investigation of Cross-show Speaker Diarization
Qian Yang, Tanja Schultz, Qin Jin
#10 Language Identification for Text Chats
Vesa Siivola, Bryan Pellom, Meagan Sills
#11 Spoken Language Recognition in the Latent Topic Simplex
Kong Aik Lee, Chang Huai You, Ville Hautamäki, Anthony Larcher, Haizhou Li

Wed-Ses1-P4:
Paralinguistic Information - Analysis and Tools

Time: Wednesday 10:00
Place: Faenza 2 - Pala Congressi (Passi Perduti-Gallery)
Type: Poster
Chair: shri narayanan

#1 Investigating Robustness of Spectral Moments on Normal- and High-Effort Speech
Frederike Gottsmann, Corinna Harwardt
#2 Comparing the Impact of Raised Vocal Effort on Various Spectral Parameters
Corinna Harwardt
#4 Vowel Context and Speaker Interactions Influencing Glottal Open Quotient and Formant Frequency Shifts in Physical Task Stress
Keith W. Godin, John H. L. Hansen
#5 Prosodic Correlates of Individual Physiological Response to Stress
Serguei Pakhomov, Michael Kotlyar
#6 The vocal effort of dominance in scenario meetings
Marcela Charfuelan, Marc Schröder
#7 A Preliminary Model of Emotional Prosody using Multidimensional Scaling
Sona Patel, Rahul Shrivastav
#8 An Exploratory Study of the Relations between Perceived Emotion Strength and Articulatory Kinematics
Jangwon Kim, Sungbok Lee, Shrikanth Narayanan
#9 Improved Acoustic Characterization of Breathy and Whispery Voices
Carlos T. Ishi, Hiroshi Ishiguro, Norihiro Hagita
#10 Neutral to Target Emotion Conversion Using Source and Suprasegmental Information
Govind D, Prasanna S R Mahadeva, Yegnanarayana B
#11 A multimodal analysis of vocal and visual backchannels in spontaneous dialogs
Khiet P. Truong, Ronald Poppe, Iwan de Kok, Dirk Heylen
#12 Kernel models for affective lexicon creation
Nikos Malandrakis, Alexandros Potamianos, Elias Iosif, Shrikanth Narayanan

Wed-Ses1-S2-P:
Speech Technology for Under-Resourced Languages II

Time: Wednesday 11:00
Place: Caravaggio (Adua 1) - Pala Affari - 1st Floor
Type: Poster
Chair: Laurent Besacier Chairs: Laurent Besacier, Alexey Karpov

#1 Automatic Prosody Generation for Serbo-Croatian Speech Synthesis Based on Regression Trees
Milan Sečujski, Darko Pekar, Nikša Jakovljević
#2 Very Large Vocabulary ASR for Spoken Russian with Syntactic and Morphemic Analysis
Alexey Karpov, Irina Kipyatkova, Andrey Ronzhin
#3 Cross-language phone recognition when the target language phoneme inventory is not known
Timothy Kempton, Roger Moore, Thomas Hain
#4 A Paradigm for Small Vocabulary Speech Recognition Based on Redundant Spectro-Temporal Feature Sets
Sourish Chaudhuri, Bhiksha Raj
#5 GorUp: an ontology-driven Audio Information Retrieval system that suits the requirements of under-resourced languages
Nora Barroso, Karmele López de Ipiña, Aitzol Ezeiza, Carmen Hernández, Nerea Ezeiza, Odei Barroso, Unai Susperregi, Barroso Simeon
#6 Woefzela - An open-source platform for ASR data collection in the developing world
Nic De Vries, Jaco Badenhorst, Marelie Davel, Etienne Barnard, Alta De Waal
#7 A Study on the Perception of Tone and Intonation in Sesotho
Hansjörg Mixdorff, Lehlohonolo Mohasi, \'Malillo Machobane, Thomas Niesler
#8 Developing a broadband automatic speech recognition system for Afrikaans
Febe de Wet, Alta de Waal, Gerhard van Huyssteen
#9 Multi-accent speech recognition of Afrikaans, Black and White varieties of South African English
Herman Kamper, Thomas Niesler
#10 Perceptual Representation of Consonant Sounds in Thai
Charturong Tantibundhit, Chutamanee Onsuwan, Tanawan Saimai, Nantaporn Saimai, sumonmas Thatphithakkul, P. Chootrakool, Krit Kosawat, Nattanun Thatphithakkul
#11 A cross-lingual approach to the development of an HMM-based speech synthesis system for Malay
Mumtaz Begum Mustafa, Ainon Raja Noor, Roziati Zainuddin, Zuraidah M. Don, Gerry Knowles

Wed-Ses2-O1:
Speaker Diarization II

Time: Wednesday 13:30
Place: Auditorium - Pala Congressi
Type: Oral
Chair: Hagai Aronowitz

13:30 Prosodic and Phonetic Features for Speaker Clustering in Speaker Diarization Systems
Janez Zibert, France Mihelic
13:50 Diarization-based Speaker Retrieval for Broadcast Television Archives
Marijn Huijbregts, David Leeuwen van
14:10 The detection of overlapping speech with prosodic features for speaker diarization
Martin Zelenák, Javier Hernando
14:30 LP Residual Features for Robust, Privacy-Sensitive Speaker Diarization
Sree Hari Krishnan Parthasarathi, Herve Bourlard, Daniel Gatica-Perez
14:50 Extending the Task of Diarization to Speaker Attribution
Houman Ghaemmaghami, David Dean, Robbie Vogt, Sridha Sridharan
15:10 Comparing Multi-Stage Approaches for Cross-Show Speaker Diarization
Viet-Anh Tran, Viet Bac Le, Claude Barras, Lori Lamel

Wed-Ses2-O3:
Adaptation for ASR

Time: Wednesday 13:30
Place: Brunelleschi (Green Room) - Pala Congressi - 2nd Floor
Type: Oral
Chair: Phil Woodland

13:30 Model Adaptation for Automatic Speech Recognition Based on Multiple Time Scale Evolution
Shinji Watanabe, Atsushi Nakamura, Biing-Hwang Juang
13:50 Integrated Online Speaker Clustering and Adaptation
Catherine Breslin, KK Chin, Mark Gales, Kate Knill
14:10 A study on speaker normalized MLP features in LVCSR
Zoltán Tüske, Christian Plahl, Ralf Schlüter
14:30 Matrix-Variate Distribution of Training Models for Robust Speaker Adaptation
Yongwon Jeong, Young Kuk Kim
14:50 Separating Speaker and Environmental Variability Using Factored Transforms
Michael Seltzer, Alex Acero
15:10 Your Mobile Virtual Assistant Just Got Smarter!
Mazin Gilbert, Iker Arizmendi, Enrico Bocchieri, Diamantino Caseiro, Vincent Goffin, Andrej Ljolje, Mike Philips, Chao Wang, Jay Wilpon

Wed-Ses2-O2:
Prosody II

Time: Wednesday 13:30
Place: Leonardo - Pala Affari - Ground Floor
Type: Oral
Chair: Pilar Prieto

13:30 Analysing the correspondence between automatic prosodic segmentation and syntactic structure
Gyorgy Szaszak, Katalin Nagy, Andras Beke
13:50 Long-distance rhythmic dependencies and their application to automatic language identification
Joseph Tepperman, Emily Nava
14:10 Symbolic and Direct Sequential Modeling of Prosody for Classification of Speaking-Style and Nativeness
Andrew Rosenberg
14:30 Prosodic Analysis and Perception of Mandarin Utterances Conveying Attitudes
Wentao Gu, Ting Zhang, Hiroya Fujisaki
14:50 Predicting Taiwan Mandarin tone shapes from their duration
Chierh Cheng, Michele Gubian
15:10 Variation of Accent Type and of Context – Influences on Pragmatic Focus Interpretation
Charlotte Wollermann, Ulrich Schade, Bernhard Schröder

Wed-Ses2-O4:
SLP for Information Extraction and Retrieval II

Time: Wednesday 13:30
Place: Michelangelo - Pala Affari - 2nd Floor
Type: Oral
Chair: Pascale Fung

13:30 Topic Segmentation of TV-streams by mathematical morphology and vectorization
Vincent Claveau, Sébastien Lefèvre
13:50 Probabilistic Latent Semantic Analysis for Broadcast News Story Segmentation
Mimi Lu, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li
14:10 Hybrid Speech Recognition for Voice Search: a Comparative Study
Evandro Gouvea
14:30 A New Phonetic Candidate Generator for Improving Search Query Efficiency
Bo Peng, Yao Qian, Frank Soong, Bo Zhang
14:50 Towards Voice-Input Symbolic Pattern Retrieval using Parameter-Based Search
Yukiko Suzuki, Kiyoaki Aikawa
15:10 A Language Independent Approach to Audio Search
Vikram Gupta, Jitendra Ajmera, Arun Kumar, Ashish Verma

Wed-Ses2-S1:
Speaker State Challenge - Intoxication and Sleepiness II

Time: Wednesday 13:30
Place: Raffaello - Pala Affari - 3rd Floor
Type: Oral
Chair: Anton Batliner

13:30 Perception of Alcoholic Intoxication in Speech
Florian Schiel
13:50 Detecting sleepiness by fusing classifiers trained with novel acoustic features
Tauhidur Rahman, Soroosh Mariooryad, Shalini Keshavamurthy, Gang Liu, John H.L. Hansen, Carlos Busso
14:10 An HMM-Based Approach to the INTERSPEECH 2011 Speaker State Challenge
Albino Nogueiras
14:30 RANSAC-based Training Data Selection for Speaker State Recognition
Elif Bozkurt, Engin Erzin, Cigdem Eroglu Erdem, Arif Tanju Erdem
14:50 University of Ljubljana System for Interspeech 2011 Speaker State Challenge
Rok Gajšek, Simon Dobrišek, France Mihelič
15:10 Speaker State Classification Based on Fusion of Asymmetric SIMPLS and Support Vector Machines
Dong-Yan Huang, Shuzhi Sam Ge, Zhengchen Zhang