|
12thAnnual Conference of the
International Speech Communication Association
|
sponsors
|
Interspeech 2011 Florence |
Interspeech 2011 Technical Programme
Sun-Ses2-O1: Speaker Recognition - Modeling
Time: Sunday 13:30 Place: Auditorium - Pala Congressi Type: Oral Chair: Andrea Paoloni
| 13:30 | Skew Gaussian mixture models for speaker recognition
| | Avi Matza, Yuval Bistritz
|
| |
| 13:50 | Towards Goat Detection in Text-Dependent Speaker Verification
| | Orith Toledo-Ronen, Hagai Aronowitz, Ron Hoory, Jason Pelecanos, David Nahamoo
|
| |
| 14:10 | Speaker modeling using local binary decisions
| | Jean-Francois Bonastre, Xavier Anguera, Gabirel H. Sierra, Pierre-Michel Bousquet
|
| |
| 14:30 | New Developments in Voice Biometrics for User Authentication
| | Hagai Aronowitz, Ron Hoory, Jason Pelecanos, David Nahamoo
|
| |
| 14:50 | Evaluation of i-vector Speaker Recognition Systems for Forensic Application
| | Miranti Indar Mandasari, Mitchell McLaren, David van Leeuwen
|
| |
| 15:10 | Mixture of PLDA Models in I-Vector Space for Gender-Independent Speaker Recognition
| | Mohammed Senoussaoui, Patrick Kenny, Niko Brümmer, Edward De Villiers, Pierre Dumouchel
|
| |
Sun-Ses2-O3: Speech Representation and Modelling
Time: Sunday 13:30 Place: Brunelleschi (Green Room) - Pala Congressi - 2nd Floor Type: Oral Chair: Yannis Stylianou
| 13:30 | A Long-Term Harmonic plus Noise Model for Speech Signals
| | Faten Ben Ali, Laurent Girin, Sonia Djaziri Larbi
|
| |
| 13:50 | A Frequency Domain Approach to ARX-LF Voiced Speech Parameterization and Synthesis
| | Alan O Cinneide, David Dorran, Gainza Mikel, Eugene Coyle
|
| |
| 14:10 | Automatic Data-Driven Learning of Articulatory Primitives from Real-Time MRI Data using Convolutive NMF with Sparseness Constraints
| | Vikram Ramanarayanan, Athanasios Katsamanis, Shrikanth Narayanan
|
| |
| 14:30 | Online Pattern Learning for Non-Negative Convolutive Sparse Coding
| | Dong Wang, Ravichander Vipperla, Nicholas Evans
|
| |
| 14:50 | Sinewave Representations of Nonmodality
| | Nicolas Malyska, Thomas F. Quatieri, Robert Dunn
|
| |
| 15:10 | Time-Varying Signal Adaptive transform and IHT recovery of compressive sensed speech
| | Srikanth Raj Ch, Sreenivas T. V.
|
| |
Sun-Ses2-O2: Speech Perception - Speech Intelligibility
Time: Sunday 13:30 Place: Leonardo - Pala Affari - Ground Floor Type: Oral Chair: Anne Cutler
| 13:30 | Segregation of whispered speech interleaved with noise or speech maskers
| | Nandini Iyer, Douglas, S. Brungart, Brian D. Simpson
|
| |
| 13:50 | Monaural Azimuth Localization Using Spectral Dynamics of Speech
| | Roi Kliper, Hendrik Kayser, Daphna Weinshall, Israel Nelken, Jörn Anemüller
|
| |
| 14:10 | Prediction of binaural intelligiblity level differences in reverberation
| | Jan Rennies, Thomas Brand, Birger Kollmeier
|
| |
| 14:30 | Let’s all speak together! Exploring the impact of various languages on the comprehension of speech in multi-linguistic babble.
| | Aurore Gautreau, Michel Hoen, Fanny Meunier
|
| |
| 14:50 | Cross-Rate Variation in the Intelligibility of Dual-Rate Gated Speech in Older Listeners
| | Valeriy Shafiro, Stanley Sheft, Robert Risley
|
| |
| 15:10 | An Efferent-Inspired Auditory Model Front-End for Speech Recognition
| | Chia-ying Lee, James Glass, Oded Ghitza
|
| |
Sun-Ses2-O4: Emotion, Speaking Style, and Social Behavior
Time: Sunday 13:30 Place: Michelangelo - Pala Affari - 2nd Floor Type: Oral Chair: Anton Batliner
| 13:30 | Acoustic-Linguistic Recognition of Interest in Speech with Bottleneck-BLSTM Nets
| | Martin Woellmer, Felix Weninger, Florian Eyben, Bjoern Schuller
|
| |
| 13:50 | Automatic Detection of Anger in Human-Human Call Center Dialogs
| | Mustafa Erden, Levent M. Arslan
|
| |
| 14:10 | Improved Classification of Speaking Styles for Mental Health Monitoring using Phoneme Dynamics
| | Keng-hao Chang, Howard Lei, John Canny
|
| |
| 14:30 | \"You made me do it\": Classification of Blame in Married Couples\' Interactions by Fusing Automatically Derived Speech and Language Information
| | Matthew P. Black, Panayiotis G. Georgiou, Athanasios Katsamanis, Brian R. Baucom, Shrikanth S. Narayanan
|
| |
| 14:50 | Context and priming effects in the recognition of emotion in old and young listeners
| | Martijn Goudbeek, Marie Nilsenová
|
| |
| 15:10 | Acoustic and Prosodic Correlates of Social Behavior
| | Agustin Gravano, Rivka Levitan, Laura Willson, Stefan Benus, Julia Hirschberg, Ani Nenkova
|
| |
Sun-Ses2-O5: HMM-based Speech Synthesis I
Time: Sunday 13:30 Place: Raffaello - Pala Affari - 3rd Floor Type: Oral Chair: Keiichi Tokuda
| 13:30 | Decision Tree-based Clustering with Outlier Detection for HMM-based Speech Synthesis
| | Kyung Hwan Oh, June Sig Sung, Doo Hwa Hong, Nam Soo Kim
|
| |
| 13:50 | Prediction of voice aperiodicity based on spectral representations in HMM speech synthesis
| | Hanna Silén, Elina Helander, Moncef Gabbouj
|
| |
| 14:10 | A Perceptual Expressivity Modeling Technique for Speech Synthesis Based on Multiple-Regression HSMM
| | Takashi Nose, Takao Kobayashi
|
| |
| 14:30 | Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis
| | Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda
|
| |
| 14:50 | Feature-Space Transform Tying in Unified Acoustic-Articulatory Modelling for Articulatory Control of HMM-based Speech Synthesis
| | Zhen-Hua Ling, Korin Richmond, Junichi Yamagishi
|
| |
| 15:10 | The Effect of Using Normalized Models in Statistical Speech Synthesis
| | Matt Shannon, Heiga Zen, William Byrne
|
| |
Sun-Ses2-S1-O: Speech and Language Processing-Based Assistive Technologies and Health Applications
Time: Sunday 13:30 Place: Caravaggio (Adua 1) - Pala Affari - 1st Floor Type: Oral Chair: Tobias Bocklet
Chairs: Tobias Bocklet, Gokhan Tur
| 13:30 | Automatic Detection of Depression in Speech using Gaussian Mixture Modeling with Factor Analysis
| | Douglas Sturim, Pedro Torres-Carrasquillo,, Thomas Quatieri, Nicolas Malyska, Alan McCree
|
| |
| 13:50 | Utterance Verification for automating the Hearing In Noise Test (HINT)
| | H. Timothy Bunnell, Jason Lilley, Sigfrid Soli, Ivan Pal
|
| |
| 14:10 | Analyzing the Nature of ECA Interactions in Children with Autism
| | Emily Mower, Chi-Chun Lee, James Gibson, Theodora Chaspari, Marian Williams, Shrikanth Narayanan
|
| |
Sun-Ses2-P1: Second Language Acquisition, Development and Learning I
Time: Sunday 13:30 Place: Valfonda 1 - Pala Congressi (Passi Perduti-Gallery) Type: Poster
| #1 | Acquisition of Timing Patterns in Second Language
| | Mikhail Ordin, Leona Polyanskaya
|
| |
| #2 | Context-dependent Duration Modeling with Backoff Strategy and Look-up Tables for Pronunciation Assessment and Mispronunciation Detection
| | Hongyan Li, Shen Huang, Shijin Wang, Bo Xu
|
| |
| #3 | Perceptual training of vowel length contrast of Japanese by L2 listeners: Effects of an isolated word versus a word embedded in sentences
| | Mee Sonu, Keiichi Tajime, Hiroaki Kato, Yoshinori Sagisaka
|
| |
| #4 | Similar Vowels in L1/L2 Production: Confused or Discerned in Early L2 English Learners with Different amount of Exposure
| | E-Chin Wu
|
| |
| #5 | Production and perception of Estonian vowels by native and non-native speakers
| | Lya Meister, Einar Meister
|
| |
| #6 | New feature parameters for pronunciation evaluation in English presentations at international conferences
| | Hiroshi Kibishi, Seiichi Nakagawa
|
| |
| #7 | Synchronous reading: learning French orthography by audiovisual training
| | Gérard Bailly, William-Seamus Barbour
|
| |
| #8 | Phoneme Level Non-Native Pronunciation Analysis by an Auditory Model-based Native Assessment Scheme
| | Christos Koniaris, Olov Engwall
|
| |
| #9 | The open front vowel /æ/ in the production and perception of Czech students of English
| | Pavel Šturm, Radek Skarnitzl
|
| |
| #10 | Error selection for ASR-based English pronunciation training in \'My Pronunciation Coach\'
| | Catia Cucchiarini, Henk van den Heuvel, Eric Sanders, Helmer Strik
|
| |
| #11 | An Experimental Analysis of Pitch Patterns in Japanese Speakers of English with Verification by Speech Re-synthesis
| | Tomoko Nariai, Kazuyo Tanaka
|
| |
| #12 | An Analysis of Word Duration in Native Speakers and Japanese Speakers of English
| | Tomoko Nariai, Kazuyo Tanaka
|
| |
Sun-Ses2-P2: Speech Enhancement
Time: Sunday 13:30 Place: Valfonda 2 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Dietrich Klakow
| #1 | Evaluating artificial bandwidth extension by conversational tests in car using mobile devices with integrated hands-free functionality
| | Laura Laaksonen, Ville Myllylä, Riitta Niemistö
|
| |
| #2 | Low-Frequency Bandwidth Extension of Telephone Speech Using Sinusoidal Synthesis and Gaussian Mixture Model
| | Hannu Pulakka, Ulpu Remes, Santeri Yrttiaho, Kalle Palomäki, Mikko Kurimo, Paavo Alku
|
| |
| #3 | Memory-Based Approximation of the Gaussian Mixture Model Framework for Bandwidth Extension of Narrowband Speech
| | Amr Nour-Eldin, Peter Kabal
|
| |
| #4 | Speech enhancement by reconstruction from cleaned acoustic features
| | Philip Harding, Ben Milner
|
| |
| #5 | A Soft Decision-based Speech Enhancement using Acoustic Noise Classification
| | Jae-Hun Choi, Sang-Kyun Kim, Joon-Hyuk Chang
|
| |
| #6 | A Noise Estimation Method Based on Speech Presence Probability and Spectral Sparseness
| | Chao Li, Wenju Liu
|
| |
| #7 | Improved a posteriori Speech Presence Probability Estimation Based on Cepstro-Temporal Smoothing and Time-Frequency Correlation
| | Chao Li, Wenju Liu
|
| |
| #8 | A Rapid Adaptation Algorithm for Tracking Highly Non-Stationary Noises Based on Bayesian Inference for On-Line Spectral Change Point Detection
| | Md Foezur Rahman Chowdhury Chowdhury, Sid-Ahmed Selouani, Douglas O\'Shaughnessy
|
| |
| #9 | Single channel speech enhancement using MMSE estimation of short-time modulation magnitude spectrum
| | Kuldip Paliwal, Belinda Schwerin, Kamil Wojcicki
|
| |
| #10 | Speech Enhancement Using Masking Properties in Adverse Environments
| | Atanu Saha, Tetsuya Shimamura
|
| |
| #11 | Phoneme-dependent NMF for speech enhancement in monaural mixtures
| | Bhiksha Raj, Rita Singh, Tuomas Virtanen
|
| |
| #12 | Kernel PCA for Speech Enhancement
| | Christina Leitner, Franz Pernkopf, Gernot Kubin
|
| |
| #13 | Objective Intelligibility Prediction of Speech by Combining Correlation and Distortion based Techniques
| | Angel Gomez, Belinda Schwerin, Kuldip Paliwal
|
| |
Sun-Ses2-P3: ASR - Feature Extraction I
Time: Sunday 13:30 Place: Faenza 1 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Fabio Brugnara
| #1 | Integrating recent MLP feature extraction techniques into TRAP architecture
| | Frantisek Grezl, Martin Karafiat
|
| |
| #2 | Feature Frame Stacking in RNN-based Tandem ASR Systems - Learned vs. Predefined Context
| | Martin Woellmer, Bjoern Schuller, Gerhard Rigoll
|
| |
| #3 | Improved Acoustic Feature Combination for LVCSR by Neural Networks
| | Christian Plahl, Ralf Schlüter, Hermann Ney
|
| |
| #4 | Hierarchical Tandem Features for ASR in Mandarin
| | Joel Pinto, Mathew Magimai.-Doss, Herve Bourlard
|
| |
| #5 | Analysis and Comparison of Recent MLP Features for LVCSR Systems
| | Fabio Valente, Mathew Magimai Doss, Wen Wang
|
| |
| #6 | Deep Learning of Speech Features for Improved Phonetic Recognition
| | Jaehyung Lee, Soo-Young Lee
|
| |
| #7 | Globality-Locality Consistent Discriminant Analysis for Phone Classification
| | Heyun Huang, Yang Liu, Jort Gemmeke, Louis ten Bosch, Bert Cranen, Lou Boves
|
| |
| #8 | Front-End Compensation Methods for LVCSR Under Lombard Effect
| | Hynek Boril, Frantisek Grezl, John H.L. Hansen
|
| |
| #9 | Classification of Fricatives Using Feature Extrapolation of Acoustic-Phonetic Features in Telephone Speech
| | Jung-Won Lee, Jeung-Yoon Choi, Hong-Goo Kang
|
| |
| #10 | Noise Robust Feature Extraction Based on Extended Weighted Linear Prediction in LVCSR
| | Sami Keronen, Jouni Pohjalainen, Paavo Alku, Mikko Kurimo
|
| |
| #11 | Comparing Different Flavors of Spectro-Temporal Features for ASR
| | Bernd T. Meyer, Suman V. Ravuri, Marc René Schädler, Nelson Morgan
|
| |
| #12 | VTLN in the MFCC domain: band-limited versus local interpolation
| | Ehsan Variani, Thomas Schaaf
|
| |
| #13 | Multistream Bandpass Modulation Features for Robust Speech Recognition
| | Sridhar Krishna Nemala, Kailash Patil, Mounya Elhilali
|
| |
| #14 | An Analysis of Automatic Speech Recognition with Multiple Microphones
| | Davide Marino, Thomas Hain
|
| |
Sun-Ses2-P4: Spoken Dialogue & Spoken Language Understanding Systems
Time: Sunday 13:30 Place: Faenza 2 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Steve Renals
| #1 | Multi-view approach for speaker turn role labeling in TV Broadcast News shows
| | Geraldine Damnati, Delphine Charlet
|
| |
| #2 | Evaluation of an Integrated Authoring Tool for Building Advanced Question-Answering Characters
| | Sudeep Gandhe, Michael Rushforth, Priti Aggarwal, David Traum
|
| |
| #3 | Towards Unsupervised Spoken Language Understanding: Exploiting Query Click Logs for Slot Filling
| | Gokhan Tur, Dilek Hakkani-Tür, Dustin Hillard, Asli Celikyilmaz
|
| |
| #4 | Web-enhanced Contents Retrieval for Information Access Dialogue System
| | Donghyeon Lee, Cheongjae Lee, Minwoo Jeong, Kyungduk Kim, Seokhwan Kim, Junhwi Choi, Gary Geunbae Lee
|
| |
| #5 | Uncertainty management for on-line optimisation of a POMDP-based large-scale spoken dialogue system
| | Lucie Daubigney, Milica Gasic, Senthilkumar Chandramohan, Matthieu Geist, Olivier Pietquin, Steve Young
|
| |
| #6 | Detection of task-incomplete dialogs based on utterance-and-behavior tag N-gram for spoken dialog systems
| | Sunao Hara, Norihide Kitaoka, Kazuya Takeda
|
| |
| #7 | Shrinkage Based Features for Natural Language Call-Routing
| | Ruhi Sarikaya, Stanley F. Chen, Bhuvana Ramabhadran
|
| |
| #8 | Clustering with modified cosine distance learned from constraints
| | Leonid Rachevsky, Dimitri Kanevsky, Ruhi Sarikaya, Bhuvana Ramabhadran
|
| |
| #9 | Using Speaker ID to Discover Repeat Callers to a Spoken Dialog System
| | Andrew Fandrianto, Brian Langner, Alan W Black
|
| |
| #10 | Semantic graph clustering for POMDP-based spoken dialog systems
| | Florian Pinault, Fabrice Lefèvre
|
| |
| #11 | Learning Place-Names from Spoken Utterances and Localization Results by Mobile Robot
| | Ryo Taguchi, Yuji Yamada, Koosuke Hattori, Taizo Umezaki, Masahiro Hoguro, Naoto Iwahashi, Kotaro Funakoshi, Mikio Nakano
|
| |
| #12 | Active Learning for Dialogue Act Classification
| | Björn Gambäck, Fredrik Olsson, Oscar Täckström
|
| |
| #13 | Speaker Role Recognition using question detection and characterization
| | Thierry Bazillon, Benjamin Maza, Mickael Rouvier, Frederic Bechet, Alexis Nasr
|
| |
| #14 | Learning Score Structure from Spoken Language for A Tennis Game
| | Qiang Huang, Stephen Cox
|
| |
| #15 | Semi-automated classifier adaptation for natural language call routing
| | Silke M. Witt
|
| |
| #16 | Interactional Style Detection for Versatile Dialogue Response Using Prosodic and Semantic Features
| | Wei-Bin Liang, Chung-Hsien Wu, Chih-Hung Wang, Jhing-Fa Wang
|
| |
| #17 | Quality aspects of multimodal dialog systems: identity, stimulation and success
| | Christine Kuehnel, Benjamin Weiss, Matthias Schulz, Sebastian Moeller
|
| |
Sun-Ses2-S1-P: Speech and Language Processing-Based Assistive Technologies and Health Applications
Time: Sunday 14:30 Place: Caravaggio (Adua 1) - Pala Affari - 1st Floor Type: Poster Chair: Shri Narayanan
Chairs: Shri Narayanan, Elmar Noeth
| #1 | Incorporating Speech Recognition Engine Into an Intelligent Assistive Reading System for Dyslexic Students
| | Theologos Athanaselis, Stelios Bakamidis, Ioannis Dologlou, Evmorfia N. Argyriou, Antonis Symvonis
|
| |
| #2 | An Investigation of Depressed Speech Detection: Features and Normalization
| | Nicholas Cummins, Julien Epps, Michael Breakspear, Roland Goecke
|
| |
| #3 | Using Prosodic and Spectral Features in Detecting Depression in Elderly Males
| | Michelle Hewlett Sanchez, Dimitra Vergyri, Luciana Ferrer, Colleen Richey, Pablo Garcia, Bruce Knoth, William Jarrold
|
| |
| #4 | Combining phonological and acoustic ASR-free features for pathological speech intelligibility assessment
| | Catherine Middag, Tobias Bocklet, Jean-Pierre Martens, Elmar Nöth
|
| |
| #5 | Speech Synthesis Parameter Generation for the Assistive Silent Speech Interface MVOCA
| | Robin Hofe, Stephen R. Ell, Michael J. Fagan, James M. Gilbert, Phil D. Green, Roger K. Moore, Sergey I. Rybchenko
|
| |
| #6 | Computer-Assisted Disfluency Counts for Stuttered Speech
| | Peter A. Heeman, Andy McMillin, J. Scott Yaruss
|
| |
| #7 | Spectral Features for Automatic Blind Intelligibility Estimation of Spastic Dysarthric Speech
| | Richard Hummel, Wai-Yip Chan, Tiago Falk
|
| |
| #8 | Extraction of narrative recall patterns for neuropsychological assessment
| | Emily Prud\'hommeaux, Brian Roark
|
| |
| #9 | Gesture Design of Hand-to-Speech Converter derived from Speech-to-Hand Converter based on Probabilistic Integration Model
| | Aki Kunikoshi, Yu Qiao, Daisuke Saito, Nobuaki Minematsu, Keikichi Hirose
|
| |
| #10 | Powered Wheelchair Control Using Acoustic-Based Recognition of Head Gesture Accompanying Speech
| | Akira Sasou
|
| |
| #11 | Analyzing training dependencies and posterior fusion in discriminant classification of apnea patients based on sustained and connected speech
| | Jose Luis Blanco, Ruben Fernandez, Doroteo Torre, Francisco Javier Caminero, Eduardo Lopez
|
| |
Sun-Ses3-O1: Speaker Recognition - Modeling, Automatic Procedures, Analysis I
Time: Sunday 16:00 Place: Auditorium - Pala Congressi Type: Oral Chair: Luciano Romito
| 16:00 | Restoring the Residual Speaker Information in Total Variability Modeling for Speaker Verification
| | Ce Zhang, Rong Zheng, Bo Xu
|
| |
| 16:20 | New Developments in Joint Factor Analysis for Speaker Verification
| | Hagai Aronowitz, Oren Barkan
|
| |
| 16:40 | Speaker recognition using temporal contours in linguistic units: the case of formant and formant-bandwidth trajectories
| | Joaquin Gonzalez-Rodriguez
|
| |
| 17:00 | Discriminatively Trained i-vector Extractor for Speaker Verification
| | Ondrej Glembek, Lukas Burget, Niko Brummer, Oldrich Plchot, Pavel Matejka
|
| |
| 17:20 | Constrained Cepstral Speaker Recognition Using Matched UBM and JFA Training
| | Michelle Hewlett Sanchez, Luciana Ferrer, Elizabeth Shriberg, Andreas Stolcke
|
| |
| 17:40 | A New Perspective on GMM Subspace Compensation Based on PPCA and Wiener Filtering
| | Alan McCree, Doug Sturim, Doug Reynolds
|
| |
Sun-Ses3-O3: Speech Analysis
Time: Sunday 16:00 Place: Brunelleschi (Green Room) - Pala Congressi - 2nd Floor Type: Oral Chair: Thomas F. Quatieri
| 16:00 | Adaptive Estimation of Zeros of Time-Varying Z-Transforms
| | Christian Fischer Pedersen, Ove Andersen, Paul Dalsgaard
|
| |
| 16:20 | Identifying regions of non-modal phonation using features of the wavelet transform
| | John Kane, Christer Gobl
|
| |
| 16:40 | Acoustic Analysis of Whispered Speech for Phoneme and Speaker Dependency
| | Xing Fan, Keith Godin, John Hansen
|
| |
| 17:00 | Multi-party Speech Recovery Exploiting Structured Sparsity Models
| | Afsaneh Asaei, Mohammad Javad Taghizadeh, Hervé Bourlard, Volkan Cevher
|
| |
| 17:20 | Modulation spectrum analysis for recognition of reverberant speech
| | Sri Harish Mallidi, Sriram Ganapathy, Hynek Hermansky
|
| |
| 17:40 | Discrete Choice Models for Non-Intrusive Quality Assessment
| | Petko N. Petkov, W. Bastiaan Kleijn, Bert de Vries
|
| |
Sun-Ses3-O2: Speech Perception - Perceptual Learning and Cross-Language Perception
Time: Sunday 16:00 Place: Leonardo - Pala Affari - Ground Floor Type: Oral Chair: Catia Cucchiarini
| 16:00 | Perceptual learning of liquids
| | Odette Scharenborg, Holger Mitterer, James M. McQueen
|
| |
| 16:20 | The Efficiency of Cross-dialectal Word Recognition
| | Annelie Tuinman, Holger Mitterer, Anne Cutler
|
| |
| 16:40 | Estimation of Perceptual Spaces for Speaker Identities Based on the Cross-Lingual Discrimination Task
| | Minoru Tsuzaki, Keiichi Tokuda, Hisashi Kawai, Jinfu Ni
|
| |
| 17:00 | The relation between perception and production in L2 phonological processing
| | Sharon Peperkamp, Sharon Peperkamp, Camillia Bouchon
|
| |
| 17:20 | The Role of Word-Initial Glottal Stops in Recognizing English Words
| | Maria Paola Bissiri, María Luisa Lecumberri, Martin Cooke, Jan Volín
|
| |
| 17:40 | Effect of language experience on the categorical perception of Cantonese vowel duration
| | Caicai Zhang, Gang Peng, William S-Y. Wang
|
| |
Sun-Ses3-O4: Speech Enhancement and Dereverberation
Time: Sunday 16:00 Place: Michelangelo - Pala Affari - 2nd Floor Type: Oral Chair: Peter Kabal
| 16:00 | Single channel dereverberation using example-based speech enhancement with uncertainty decoding technique
| | Keisuke Kinoshita, Mehrez Souden, Marc Delcroix, Tomohiro Nakatani
|
| |
| 16:20 | A statistical room impulse response model with frequency dependent reverberation time for single-microphone late reverberation suppression
| | Jan Erkelens, Richard Heusdens
|
| |
| 16:40 | An Assessment of the Improvement Potential of Time-Frequency Masking for Speech Dereverberation
| | Chenxi Zheng, Tiago Falk, Wai-Yip Chan
|
| |
| 17:00 | Perceptual Improvement of a Two-Stage Algorithm for Speech Dereverberation
| | Thiago Prego, Amaro de Lima, Sergio Netto
|
| |
| 17:20 | A Model-Based Spectral Envelope Wiener Filter for Perceptually Motivated Speech Enhancement
| | Najib Hadir, Friedrich Faubel, Dietrich Klakow
|
| |
| 17:40 | Binaural Noise-Reduction Method based on Blind Source Separation and Perceptual post processing
| | Jorge Marin-Hurtado, Devangi Parikh, David Anderson
|
| |
Sun-Ses3-O5: ASR - Feature Extraction II
Time: Sunday 16:00 Place: Raffaello - Pala Affari - 3rd Floor Type: Oral Chair: Dong Yu
| 16:00 | Region Dependent Transform on MLP Features for Speech Recognition
| | Tim Ng, Bing Zhang, Spyros Matsoukas, Long Nguyen
|
| |
| 16:20 | Discriminant Sub-Space Projection of Spectro-Temporal Speech Features based on Maximizing Mutual Information
| | Martin Heckmann, Claudius Gläser
|
| |
| 16:40 | Combining feature space discriminative training with long-term spectro-temporal features for noise-robust speech recognition
| | Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura
|
| |
| 17:00 | Combining Frame and Segment Level Processing via Temporal Pooling for Phonetic Classification
| | Sumit Chopra, Patrick Haffner, Dimitrios Dimitriadis
|
| |
| 17:20 | Improved Bottleneck Features Using Pretrained Deep Neural Networks
| | Dong Yu, Michael L. Seltzer
|
| |
| 17:40 | MINIMUM CLASSIFICATION ERROR BASED SPECTRO-TEMPORAL FEATURE EXTRACTION FOR ROBUST AUDIO EVENT CLASSIFICATION
| | Yuan-Fu Liao
|
| |
Sun-Ses3-S1-O: Crowdsourcing for Speech Processing I
Time: Sunday 16:00 Place: Caravaggio (Adua 1) - Pala Affari - 1st Floor Type: Oral Chair: Maxine Eskenazi
Chairs: Maxine Eskenazi, David Suendermann
Chairs: Maxine Eskenazi, David Suendermann, Gina-Anne Levow
| 16:00 | Speaking to the Crowd: looking at past achievements in using crowdsourcing for speech and predicting future challenges
| | Gabriel Parent, Maxine Eskenazi
|
| |
Sun-Ses3-P1: Prosodic Structure
Time: Sunday 16:00 Place: Valfonda 1 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Elizabeth Schriberg
| #1 | Where should pitch accents and phrase breaks go? A syntax tree transducer solution
| | Joseph Tepperman, Emily Nava
|
| |
| #2 | Phrasal prominences do not need pitch movements: postfocal phrasal heads in Italian
| | Giuliano Bocci, Cinzia Avesani
|
| |
| #3 | Intonation of left dislcated topics in Modern Greek
| | David Le Gac, Hiyon Yoo
|
| |
| #4 | Phrases, pitch and perceived prominence in Māori
| | Laura Thompson, Catherine I. Watson, Ray Harlow, Jeanette King, Margaret Maclagan, Helen Charters, Peter Keegan
|
| |
| #5 | Perceptual sensitivity to prenuclear and nuclear intonational patterns
| | Tomáš Duběda
|
| |
| #6 | Tonal Alignment Defined: the case of Southern Irish English
| | Raya Kalaldeh
|
| |
| #7 | Using Mutual Information to Identify Regions of Analysis for Prosodic Analysis
| | Andrew Rosenberg
|
| |
| #8 | Prosodic highlights in Mandarin continuous speech—Cross-genre attributes and implications
| | Chiu-yu Tseng, Zhao-yu Su, Chi-Feng Huang
|
| |
| #9 | When two newly-acquired words are one: New words differing in stress alone are not automatically represented differently
| | Simone Sulpizio, James McQueen
|
| |
| #10 | Automatic Determination of the Standard Chinese Prosodic Phrase Boundaries by $F_0$ Generation Model
| | Shehui Bu, Zhenjie Zhuo, Lingling Yang, Shuichi Itahashi
|
| |
| #11 | Measuring speakers’ similarity in speech by means of prosodic cues: methods and potential
| | Celine De Looze, Stephane Rauzy
|
| |
| #12 | Tonal Variations in Mandarin: New Evidence from Spontaneous and Read Speech
| | Li-chiung Yang
|
| |
Sun-Ses3-P2: Language Processing
Time: Sunday 16:00 Place: Valfonda 2 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Frederic Bechet
| #1 | Accounting for prosodic information to improve ASR-based topic tracking for TV Broadcast News
| | Camille Guinaudeau, Julia Hirschberg
|
| |
| #2 | Morpheme Conversion for Connecting Speech Recognizer and Language Analyzers in Unsegmented Languages
| | Kenji Imamura, Tomoko Izumi, Kugatsu Sadamitsu, Kuniko Saito, Satoshi Kobashikawa, Hirokazu Masataki
|
| |
| #3 | Emotion Detection Based on Concept Inference and Spoken Sentence Analysis for Customer Service
| | Ren-Ying Fang, Bo-Wei Chen, Jhing-Fa Wang, Chung-Hsien Wu
|
| |
| #4 | Commas recovery with syntactic features in French and in Czech
| | Christophe Cerisara, Pavel Král, Claire Gardent
|
| |
| #5 | Redundancy Reduction in ASR of Spontaneous Speech through Statistical Machine Translation
| | Daniele Falavigna
|
| |
| #6 | From Interview to News Text : A Study of Taiwan TV Political Interviews in Newspaper Reports
| | Chin-Chih Chiang
|
| |
Sun-Ses3-P3: ASR - language models I
Time: Sunday 16:00 Place: Faenza 1 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Michael Riley
| #1 | Unary Data Structures for Language Models
| | Jeffrey Sorensen, Cyril Allauzen
|
| |
| #2 | Bayesian Language Model Interpolation for Mobile Speech Input
| | Cyril Allauzen, Michael Riley
|
| |
| #3 | On the Estimation of Discount Parameters for Language Model Smoothing
| | Martin Sundermeyer, Ralf Schlüter, Hermann Ney
|
| |
| #4 | N-grams for Conditional Random Fields or a Failure-transition Posterior for Acyclic FSTs
| | Patrick Lehnen, Stefan Hahn, Hermann Ney
|
| |
| #5 | Hybrid Language Models Using Mixed Types of Sub-lexical Units for Open Vocabulary German LVCSR
| | M. Ali Basha Shaik, Amr El-Desoky Mousa, Ralf Schlueter, Hermann Ney
|
| |
| #6 | Morpheme Based Factored Language Models for German LVCSR
| | Amr El-Desoky Mousa, M. Ali Basha Shaik, Ralf Schlueter, Hermann Ney
|
| |
| #7 | Compound Word Recombination for German LVCSR
| | Markus Nußbaum-Thom, Amr El-Desoky Mousa, Ralf Schlüter, Hermann Ney
|
| |
| #8 | Lattice-Based Risk Minimization Training for Unsupervised Language Model Adaptation
| | Akio Kobayashi, Takahiro Oku, Shinichi Homma, Toru Imai, Seiichi Nakagawa
|
| |
| #9 | Similarity language model
| | Christian Gillot, Christophe Cerisara
|
| |
| #10 | Data Sampling and Dimensionality Reduction Approaches for Reranking ASR Outputs Using Discriminative Language Models
| | Erinc Dikici, Murat Semerci, Murat Saraclar, Ethem Alpaydin
|
| |
| #11 | Training a language model using webdata for large vocabulary Japanese spontaneous speech recognition
| | Ryo Masumura, Seongjun Hahm, Akinori Ito
|
| |
| #12 | Large Vocabulary SOUL Neural Network Language Models
| | Hai-Son Le, Ilya Oparin, Abdel Messaoudi, Alexandre Allauzen, Jean-Luc Gauvain, Francois Yvon
|
| |
| #13 | Improved Spoken Query Transcription using Co-occurrence Information
| | Jonathan Mamou, Abhinav Sethy, Bhuvana Ramabhadran, Ron Hoory, Paul Vozila
|
| |
| #14 | Unsupervised Latent Speaker Language Modeling
| | Yik-Cheung Tam, Paul Vozila
|
| |
Sun-Ses3-P4: Spoken Language Resources, Evaluation and Standardization I
Time: Sunday 16:00 Place: Faenza 2 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Sebastian Moeller
| #1 | Measurement of Objective Intelligibility of Japanese Accented English Using ERJ (English Read by Japanese) Database
| | Nobuaki Minematsu, Koji Okabe, Keisuke Ogaki, Keikichi Hirose
|
| |
| #2 | From Single-Call to Multi-Call Quality: A Study on Long-term Quality Integration in Audio-Visual Speech Communication
| | Sebastian Möller, Chihuy Bang, Teele Tamme, Markus Vaalgamaa, Benjamin Weiss
|
| |
| #3 | Optimal Selection of Limited Vocabulary Speech Corpora
| | Hui Lin, Jeff Bilmes
|
| |
| #4 | Open Source Multi-Language Audio Database for Spoken Language Processing Applications
| | Stephen Zahorian, Jiang Wu, Montri Karnjanadecha, Chandra Vootkuri, Brian Wong, Andrew Hwang, Eldar Tokhtamyshev
|
| |
| #5 | The USC CARE Corpus: Child-Psychologist Interactions of Children with Autism Spectrum Disorders
| | Matthew P. Black, Daniel Bone, Marian E. Williams, Phillip Gorrindo, Pat Levitt, Shrikanth S. Narayanan
|
| |
| #6 | Towards A Versatile Multi-Layered Description of Speech Corpora Using Algebraic Relations
| | Nelly Barbot, Vincent Barreaud, Olivier Boeffard, Laure Charonnat, Arnaud Delhay, Sebastien Le Maguer, Damien Lolive
|
| |
| #7 | Announcing the Electromagnetic Articulography (Day 1) Subset of the mngu0 Articulatory Corpus
| | Korin Richmond, Phil Hoole, Simon King
|
| |
| #8 | A Pitch Tracking Corpus with Evaluation on Multipitch Tracking Scenario
| | Gregor Pirker, Michael Wohlmayr, Stefan Petrik, Franz Pernkopf
|
| |
| #9 | On building and evaluating a broadcast-news audio segmentation system
| | Taras Butko, Climent Nadeu
|
| |
| #10 | Time- and Acoustic-Mediated Alignment Algorithms for Speech Recognition Evaluation
| | Simon Dobrišek, France Mihelič
|
| |
| #11 | Effects of Shortening Speech Prompts of In-Car Voice User Interfaces on Users\' Mental Models
| | Julia Niemann, Kati Schulz, Ina Wechsung
|
| |
| #12 | Speech Transcript Evaluation for Information Retrieval
| | Laurens van der Werff, Wessel Kraaij, Franciska de Jong
|
| |
| #13 | The Albayzin 2010 Language Recognition Evaluation
| | Luis Javier Rodriguez-Fuentes, Mikel Penagarikano, Amparo Varona, Mireia Diez, German Bordel
|
| |
| #14 | Progress and Prospects for Speech Technology: Results from Three Sexennial Surveys
| | Roger Moore
|
| |
| #15 | Painless WFST cascade construction for LVCSR - Transducersaurus
| | Josef Robert Novak, Nobuaki Minematsu, Keikichi Hirose
|
| |
Sun-Ses3-S1-P: Crowdsourcing for Speech Processing II
Time: Sunday 17:00 Place: Caravaggio (Adua 1) - Pala Affari - 1st Floor Type: Poster Chair: Maxine Eskenazi
Chairs: Maxine Eskenazi, David Suendermann
Chairs: Maxine Eskenazi, David Suendermann, Gina-Anne Levow
| #1 | A Transcription Task for Crowdsourcing with Automatic Quality Control
| | Chia-ying Lee, James Glass
|
| |
| #2 | Reliability-Weighted Acoustic Model Adaptation Using Crowd Sourced Transcriptions
| | Kartik Audhkhasi, Panayiotis G. Georgiou, Shrikanth S. Narayanan
|
| |
| #3 | Crowdsourcing for word recognition in noise
| | Martin Cooke, Jon Barker, Maria Luisa Garcia Lecumberri, Krzysztof Wasilewski
|
| |
| #4 | Crowdsourcing preference tests, and how to detect cheating
| | Sabine Buchholz, Javier Latorre
|
| |
| #5 | Growing a Spoken Language Interface on Amazon Mechanical Turk
| | Ian McGraw, James Glass, Stephanie Seneff
|
| |
| #6 | Real user evaluation of spoken dialogue systems using Amazon Mechanical Turk
| | Filip Jurčíček, Simon Keizer, Milica Gasic, Francois Mairesse, Blaise Thomson, Kai Yu, Steve Young
|
| |
| #7 | Quality assessment of crowdsourcing transcriptions for African languages
| | Hadrien Gelas, Solomon Teferra Abate, Laurent Besacier, François Pellegrino
|
| |
| #8 | Using crowdsourcing to provide prosodic annotations for non-native speech
| | Keelan Evanini, Klaus Zechner
|
| |
| #9 | PodCastle: Recent Advances of a Spoken Document Retrieval Service Improved by Anonymous User Contributions
| | Masataka Goto, Jun Ogata
|
| |
Mon-Ses1-O1: Speaker Recognition - Modeling, Automatic Procedures, Analysis II
Time: Monday 10:00 Place: Auditorium - Pala Congressi Type: Oral Chair: Kornel Laskowski
| 10:00 | Data-driven Gaussian Component Selection for Fast GMM-Based Speaker Verification
| | Ce Zhang, Rong Zheng, Bo Xu
|
| |
| 10:20 | Analysis of i-vector Length Normalization in Speaker Recognition Systems
| | Daniel Garcia-Romero, Carol Y. Espy-Wilson
|
| |
| 10:40 | An Analysis Framework based on Random Subspace Sampling for Speaker Verification
| | Weiwu Jiang, Zhifeng Li, Helen Meng
|
| |
| 11:00 | Factor analysis back ends for MLLR transforms in speaker recognition
| | Nicolas Scheffer, Yun Lei, Luciana Ferrer
|
| |
| 11:20 | Report on Performance Results in the NIST 2010 Speaker Recognition Evaluation
| | Craig S. Greenberg, Alvin F. Martin, Bradford N. Barr, George R. Doddington
|
| |
| 11:40 | iVector Fusion of Prosodic and Cepstral Features for Speaker Verification
| | Marcel Kockmann, Luciana Ferrer, Lukas Burget, Jan Cernocky
|
| |
Mon-Ses1-O3: Acoustic Event Detection
Time: Monday 10:00 Place: Brunelleschi (Green Room) - Pala Congressi - 2nd Floor Type: Oral Chair: Dirk van Compernolle
| 10:00 | Learning new acoustic events in an HMM-based system using MAP adaptation
| | Jürgen Thomas Geiger, Mohamed Anouar Lakhal, Björn Schuller, Gerhard Rigoll
|
| |
| 10:20 | Alternative Frequency Scale Cepstral Coefficient for Robust Sound Event Recognition
| | Yiren Leng, Huy Dat Tran, Norihide Kitaoka, Haizhou Li
|
| |
| 10:40 | Evaluation of Abnormal Sound Detection using Multi-stage GMM in Various Environments
| | Akinori Ito, Akihito Aiba, Masashi Ito, Shozo Makino
|
| |
| 11:00 | Unsupervised learning of acoustic events using dynamic time warping and hierarchical K-means++ clustering
| | Joerg Schmalenstroeer, Markus Bartek, Reinhold Haeb-Umbach
|
| |
| 11:20 | Feature Extraction Assessment for an Acoustic-Event Classification Task using the Entropy Triangle
| | David Mejía-Navarrete, Ascensión Gallardo-Antolín, Carmen Peláez-Moreno, Francisco J. Valverde-Albacete
|
| |
| 11:40 | Unsupervised Audio Analysis for Categorizing Heterogeneous Consumer Domain Videos
| | Pradeep Natarajan, Stavros Tsakalidis, Vasant Manohar, Rohit Prasad, Prem Natarajan
|
| |
Mon-Ses1-O2: Speech Production - Articulatory Measurements
Time: Monday 10:00 Place: Leonardo - Pala Affari - Ground Floor Type: Oral Chair: Paavo Alku
| 10:00 | Visualization of vocal tract shape using interleaved real-time MRI of multiple scan planes
| | Yoon-Chul Kim, Michael I. Proctor, Shrikanth S. Narayanan, Krishna S. Nayak
|
| |
| 10:20 | Biomechanical Tongue Models: An Approach to Studying Inter-speaker Variability
| | Ralf Winkler, Susanne Fuchs, Pascal Perrier, Mark Tiede
|
| |
| 10:40 | Quantifying Articulatory Distinctiveness of Vowels
| | Jun Wang, Jordan R. Green, Ashok Samal, David B. Marx
|
| |
| 11:00 | Direct Estimation of Articulatory Kinematics from Real-time Magnetic Resonance Image Sequences
| | Michael Proctor, Adam Lammert, Athanasios Katsamanis, Louis Goldstein, Christina Hagedorn, Shrikanth Narayanan
|
| |
| 11:20 | Combined optical distance sensing and electropalatography to measure articulation
| | Peter Birkholz, Christiane Neuschaefer-Rube
|
| |
| 11:40 | Simulating Post-L F0 Bouncing by Modeling Articulatory Dynamics
| | Santitham Prom-on, Yi Xu, Fang Liu
|
| |
Mon-Ses1-O4: Speech Synthesis - Unit Selection and Hybrid approaches
Time: Monday 10:00 Place: Michelangelo - Pala Affari - 2nd Floor Type: Oral Chair: Junichi Yamagish
| 10:00 | Enriching text-to-speech synthesis using automatic dialog act tags
| | Vivek Kumar Rangarajan Sridhar, Alistair Conkie, Ann Syrdal, Srinivas Bangalore
|
| |
| 10:20 | Joint Target and Join Cost Weight Training for Unit Selection Synthesis
| | Lukas Latacz, Wesley Mattheyses, Werner Verhelst
|
| |
| 10:40 | Prominence-Based Prosody Prediction for Unit Selection Speech Synthesis
| | Andreas Windmann, Igor Jauk, Fabio Tamburini, Petra Wagner
|
| |
| 11:00 | Evaluating the meaning of synthesized listener vocalizations
| | Sathish Pammi, Marc Schröder
|
| |
| 11:20 | A Hybrid TTS Approach for Prosody and Acoustic Modules
| | Iñaki Sainz, Daniel Erro, Eva Navas, Inma Hernáez
|
| |
| 11:40 | Uniform Speech Parameterization for Multi-form Segment Synthesis
| | Alexander Sorin, Slava Shechtman, Vincent Pollet
|
| |
Mon-Ses1-O5: Speech Enhancement analysis and Evaluation
Time: Monday 10:00 Place: Raffaello - Pala Affari - 3rd Floor Type: Oral Chair: Doug O'Shaughnessy
| 10:00 | Theoretical analysis of musical noise and speech distortion in structure-generalized parametric blind spatial subtraction array
| | Ryoichi Miyazaki, Hiroshi Saruwatari, Hiroshi Saruwatari, Kiyohiro Shikano, Kiyohiro Shikano
|
| |
| 10:20 | Subjective and objective evaluation of speech intelligibility enhancement under constant energy and duration constraints
| | Yan Tang, Martin Cooke
|
| |
| 10:40 | A Risk-Estimation-Based Comparison of Mean Square Error and Itakura-Saito Distortion Measures for Speech Enhancement
| | Nagarjuna Reddy Muraka, Chandra Sekhar Seelamantula
|
| |
| 11:00 | On Noise Tracking for Noise Floor Estimation
| | Mahdi Triki
|
| |
| 11:20 | Maximum a posteriori estimation of noise from non-acoustic reference signals in very low signal-to-noise ratio environments
| | Ben Milner
|
| |
| 11:40 | Blind speech prior estimation for generalized minimum mean-square error short-time spectral amplitude estimator
| | Ryo Wakisaka, Hiroshi Saruwatari, Kiyohiro Shikano, Tomoya Takatani
|
| |
Mon-Ses1-P1: Paralinguistic Information - Classification and Detection
Time: Monday 10:00 Place: Valfonda 1 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Julia Hirschberg
| #1 | On the use of multimodal cues for the prediction of degrees of involvement in spontaneous conversation
| | Catharine Oertel, Stefan Scherer, Nick Campbell
|
| |
| #2 | Anger Recognition in Spoken Dialog Using Linguistic and Para-Linguistic Information
| | Narichika Nomoto, Masafumi Tamoto, Hirokazu Masataki, Osamu Yoshioka, Satoshi Takahashi
|
| |
| #3 | Recognition of Personality Traits from Human Spoken Conversations
| | Alexei V. Ivanov, Giuseppe Riccardi, Adam J. Sporka, Jakub Franc
|
| |
| #4 | Using Multiple Databases for Training in Emotion Recognition: To Unite or to Vote?
| | Björn Schuller, Zixing Zhang, Felix Weninger, Gerhard Rigoll
|
| |
| #5 | “Would You Buy A Car From Me?” – On the Likability of Telephone Voices
| | Felix Burkhardt, Björn Schuller, Benjamin Weiss, Felix Weninger
|
| |
| #6 | Automatic Identification of Salient Acoustic Instances in Couples\' Behavioral Interactions using Diverse Density Support Vector Machines
| | James Gibson, Athanasios Katsamanis, Matthew Black, Shrikanth Narayanan
|
| |
| #7 | Predicting Speaker Changes and Listener Responses With And Without Eye-contact
| | Daniel Neiberg, Joakim Gustafson
|
| |
| #8 | Emotion Classification Using Inter- and Intra-Subband Energy Variation
| | Senaka Amarakeerthi, Tin Lay Nwe, C De Silva Liyanage, Michael Cohen
|
| |
| #9 | Emotion Classification of Infants’ Cries using Duration Ratios of Acoustic Segments
| | Kazuki Kitahara, Shinzi Michiwaki, Miku Sato, Shoichi Matsunaga, Masaru Yamashita, Kazuyuki Shinohara
|
| |
| #10 | Vowels formants analysis allows straightforward detection of high arousal acted and spontaneous emotions
| | Bogdan Vlasenko, Dmytro Prylipko, David Philippou-Hübner, Andreas Wendemuth
|
| |
| #11 | Intra-, Inter-, and Cross-cultural Classification of Vocal Affect
| | Daniel Neiberg, Petri Laukka, Hillary Anger Elfenbein
|
| |
Mon-Ses1-P2: Applications for Learning, Education, Aged and Handicapped Persons
Time: Monday 10:00 Place: Valfonda 2 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Roberto Gretter
| #1 | Verifying Human Users in Speech-Based Interactions
| | Sajad Shirali-Shahreza, Yashar Ganjali, Ravin Balakrishnan
|
| |
| #2 | Automatic Assessment of Prosody in High-Stakes English Tests
| | Jian Cheng
|
| |
| #3 | Improvement of Segmental Mispronunciation Detection with Prior Knowledge Extracted from Large L2 Speech Corpus
| | Dean Luo, Xuesong Yang, Lan Wang
|
| |
| #4 | Off-Topic Detection in Automated Speech Assessment Applications
| | Jian Cheng, Jianqiang Shen
|
| |
| #5 | Towards Context-dependent Phonetic Spelling Error Correction in Children’s Freely Composed Text for Diagnostic and Pedagogical Purposes
| | Sebastian Stüker, Johanna Fay, Kay Berkling
|
| |
| #6 | Factored Translation Models for improving a Speech into Sign Language Translation System
| | Verónica López-Ludeña, Rubén San-Segundo, Ricardo Cordoba, Javier Ferreiros, Juan Manuel Montero, José Manuel Pardo
|
| |
| #7 | Formant maps in Hungarian vowels – online data inventory for research, and education
| | Kálmán Abari, Zsuzsanna Zsófia Rácz, Gábor Olaszy
|
| |
| #8 | Automatic Subtitling of the Basque Parliament Plenary Sessions Videos
| | Germán Bordel, Slvia Nieto, Mikel Penagarikano, Luis Javier Rodriguez-Fuentes, Amparo Varona
|
| |
| #9 | Generating Animated Pronunciation from Speech through Articulatory Feature Extraction
| | Yurie Iribe, Silasak Manosavanh, Kouichi Katsurada, Ryoko Hayashi, Chunyue Zhu, Tsuneo Nitta
|
| |
| #10 | A Tale of Two Tasks: Detecting Children’s Off-Task Speech in a Reading Tutor
| | Wei Chen, Jack Mostow
|
| |
| #11 | The problems encountered by Japanese EL2 with English short vowels as illustrated on the 3D Vowel Chart
| | Toshiko Isei-Jaakkola, Takatoshi Naka, Keikichi Hirose
|
| |
| #12 | Automatic generation of listening comprehension learning material in European Portuguese
| | Thomas Pellegrini, Rui Correia, Isabel Trancoso, Jorge Baptista, Nuno Mamede
|
| |
| #13 | Candidate Generation for ASR Output Error Correction Using a Context-Dependent Syllable Cluster-Based Confusion Matrix
| | Chao-Hong Liu, Chung-Hsien Wu, David Sarwono, Jhing-Fa Wang
|
| |
| #14 | SEMI-SUPERVISED TREE SUPPORT VECTOR MACHINE FOR ONLINE COUGH RECOGNITION
| | Thai Hoa Huynh, Vu An Tran, Huy Dat Tran
|
| |
Mon-Ses1-P3: Robust Speech Recognition I
Time: Monday 10:00 Place: Faenza 1 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Pietro Laface
| #1 | A versatile Gaussian splitting approach to non-linear state estimation and its application to noise-robust ASR
| | Volker Leutnant, Alexander Krueger, Reinhold Haeb-Umbach
|
| |
| #2 | Generalized-Log Spectral Mean Normalization for Speech Recognition
| | Hilman Ferdinandus Pardede, Koichi Shinoda
|
| |
| #3 | Zero-Crossing-Based Channel Attentive Weighting of Cepstral Features for Robust Speech Recognition: The ETRI 2011 CHiME Challenge System
| | Young-Ik Kim, Hoon-Young Cho, Sang-Hoon Kim
|
| |
| #4 | Feature Compensation for Speech Recognition in Severely Adverse Environments due to Background Noise and Channel Distortion
| | Wooil Kim, John H. L. Hansen
|
| |
| #5 | Binaural cues for fragment-based speech recognition in reverberant multisource environments
| | Ning Ma, Jon Barker, Heidi Christensen, Phil Green
|
| |
| #6 | Sub-band level Histogram Equalization for Robust Speech Recognition
| | Vikas Joshi, Raghvendra Biligi, Umesh S, Luz Garcia, Carmen Benitez
|
| |
| #7 | GMM-based missing-feature reconstruction on multi-frame windows
| | Ulpu Remes, Yoshihiko Nankaku, Keiichi Tokuda
|
| |
| #8 | Improvements of a dual-input DBN for noise robust ASR
| | Yang Sun, Jort F. Gemmeke, Bert Cranen, Louis ten Bosch, Lou Boves
|
| |
| #9 | Denoising Using Optimized Wavelet Filtering for Automatic Speech Recognition
| | Randy Gomez, Tatsuya Kawahara
|
| |
| #10 | Noise Robust Speaker-Independent Speech Recognition with Invariant-Integration Features Using Power-Bias Subtraction
| | Florian Müller, Alfred Mertins
|
| |
Mon-Ses1-P4: ASR - Acoustic Models I
Time: Monday 10:00 Place: Faenza 2 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Lori Lamel
| #1 | Semi-automatic acoustic model generation from large unsynchronized audio and text chunks
| | Michele Alessandrini, Giorgio Biagetti, Alessandro Curzi, Claudio Turchetti
|
| |
| #2 | Unsupervised Testing Strategies for ASR
| | Brian Strope, Doug Beeferman, Alexander Gruenstein, Xin Lei
|
| |
| #3 | Acoustic Model Training with Detecting Transcription Errors in the Training Data
| | Gakuto KURATA, Nobuyasu ITOH, Masafumi NISHIMURA
|
| |
| #4 | Towards Unsupervised Training of Speaker Independent Acoustic Models
| | Aren Jansen, Kenneth Church
|
| |
| #5 | Acoustic Modeling with Bootstrap and Restructuring Based on Full Covariance
| | Xiaodong Cui, Xin Chen, Jian Xue, Peder A. Olsen, John R. Hershey, Bowen Zhou
|
| |
| #6 | An i-Vector based Approach to Acoustic Sniffing for Irrelevant Variability Normalization based Acoustic Model Training and Speech Recognition
| | Jian Xu, Yu Zhang, Zhi-Jie Yan, Qiang Huo
|
| |
| #7 | Log-linear Optimization of Second-order Polynomial Features with Subsequent Dimension Reduction for Speech Recognition
| | Muhammad Ali Tahir, Ralf Schlueter, Hermann Ney
|
| |
| #8 | Genre Categorization and Modeling for Broadcast Speech Transcription
| | Qingqing Zhang, Lori Lamel, Jean-Luc Gauvain
|
| |
| #9 | Individual Error Minimization Learning Framework and its Applications to Speech Recognition and Utterance Verification
| | Sunghwan Shin, Ho-Young Jung, Biing-Hwang Juang
|
| |
| #10 | Effective Triphone Mapping for Acoustic Modeling in Speech Recognition
| | Sakhia Darjaa, Miloš Cerňak, Marián Trnka, Milan Rusko, Róbert Sabo
|
| |
| #11 | Analysis of Dialectal Influence in Pan-Arabic ASR
| | Udhyakumar Nallasamy, Michael Garbus, Florian Metze, Qin Jin, Thomas Schaaf, Tanja Schultz
|
| |
| #12 | Connected Digit Recognition by Means of Reservoir Computing
| | Azarakhsh Jalalvand, fabian triefenbach, david verstraeten, jean-pierre martens
|
| |
| #13 | Large Margin - Minimum Classification Error Using Sum of Shifted Sigmoids as the Loss Function
| | Madhavi Ratnagiri, Biing-Hwang Juang, Lawrence Rabiner
|
| |
| #14 | Representing Phonological features trough a two-level finite state model
| | Javier Mikel Olaso, María Inés Torres, Raquel Justo
|
| |
| #15 | Optimization of the Gaussian Mixture Model Evaluation on GPU
| | Jan Vanek, Jan Trmal, Josef V. Psutka, Josef Psutka
|
| |
Mon-Ses2-O1 : Speaker Recognition - Analysis and Statistics I
Time: Monday 13:30 Place: Auditorium - Pala Congressi Type: Oral Chair: David Van Leeuwen
| 13:30 | Harmonic Structure Transform for Speaker Recognition
| | Kornel Laskowski, Qin Jin
|
| |
| 13:50 | Combining Evidence from Spectral and Source-like Features for Person Recognition from Humming
| | Hemant Patil, Maulik Madhavi, Keshab Parhi
|
| |
| 14:10 | Improvements in Speaker Characterization Using Spectral Subband Energy Based on Harmonic plus Noise Model
| | Yanhua Long, Zhi-Jie Yan, Frank K. Soong, Lirong Dai, Wu Guo
|
| |
| 14:30 | Implicit Segmentation in Two-Wire Speaker Recognition
| | Yosef Solewicz, Hagai Aronowitz
|
| |
| 14:50 | Boosting Speaker Recognition Performance with Compact Representations
| | Sibel Yaman, Jason Pelecanos, Mohamed K. Omar
|
| |
| 15:10 | Partitioning of Two-Speaker Conversation Datasets
| | Carlos Vaquero, Alfonso Ortega, Eduardo Lleida
|
| |
Mon-Ses2-O3: Speech Segmentation
Time: Monday 13:30 Place: Brunelleschi (Green Room) - Pala Congressi - 2nd Floor Type: Oral Chair: Daniele Falavigna
| 13:30 | A Two-stage Sample-based Phone Boundary Detector using Segmental Similarity Features
| | Yih-Ru Wang
|
| |
| 13:50 | Iterative Improvement of Speaker Segmentation in A Noisy Environment Using High-level Knowledge
| | Qiang Huang, Stephen Cox
|
| |
| 14:10 | Hierarchical Auido Segmentation with HMM and Factor Analysis in Broadcast News Domain
| | Diego Castan, Carlos Vaquero, Alfonso Ortega, David Martinez, Jesus Villalba, Eduardo Lleida
|
| |
| 14:30 | Syllable Segmentation of Continuous Speech Using Auditory Attention Cues
| | Ozlem Kalinli
|
| |
| 14:50 | Exploiting phone-class specific landmarks for refinement of segment boundaries in TTS databases
| | Vijayaditya Peddinti, Kishore Prahallad
|
| |
| 15:10 | Phoneme-Level Text to Audio Synchronization on Speech Signals with Background Music
| | Agnes Pedone, Juan Jose Burred, Simon Maller, Pierre Leveau
|
| |
Mon-Ses2-S1: Show & Tell Demonstration - Speech Systems and Applications
Time: Monday 13:30 Place: Donatello (Room Onice) - Pala Congressi - Ground Floor Type: Poster Chair: Dimitrios Dimitriadis
| #1 | An Affective Spoken Storyteller
| | Felix Burkhardt
|
| |
| #2 | Text Driven 3D Photo-Realistic Talking Head
| | Lijuan Wang, Frank Soong, Wei Han, Qiang Huo
|
| |
| #3 | Physical Models Producing Vowels with Pitch Variation
| | Arai Takayuki
|
| |
| #4 | An Engine-Independent Text-to-Speech Workplace
| | Margot Mieskes
|
| |
| #5 | An application to test the emotion conveyed by vocal and musical signals.
| | Simone Carcone, Carlo Giovannella
|
| |
| #6 | Automatic Speech Recognition System Dedicated for Polish
| | Mariusz Ziółko,, Jakub Gałka, Bartosz Ziółko, Tomasz Jadczyk, Skurzok Dawid, Mąsior Mariusz
|
| |
| #7 | Joint Application of Speech and Speaker Recognition for Automation and Security in Smart Home
| | Kong Aik Lee, Anthony Larcher, Helen Thai, Bin Ma, Haizhou Li
|
| |
| #8 | Adding a Speech Cursor to a Multimodal Dialogue System
| | Staffan Larsson, Alexander Berman, Jessica Villing
|
| |
| #9 | Prosody Toolkit: Integrating HTK, Praat and WEKA
| | Scott Thomas Christie, Serguei Pakhomov
|
| |
| #10 | Collecting life logs for experience-based corpora
| | Fabiano Francesconi, Arindam Ghosh, Giuseppe Riccardi, Marco Ronchetti, Alex Vagin
|
| |
Mon-Ses2-O2: Speech Production - Coarticulation and Speech Timing
Time: Monday 13:30 Place: Leonardo - Pala Affari - Ground Floor Type: Oral Chair: Wim van Dommelen
| 13:30 | Jaw movement in vowels and liquids forming the syllable nucleus
| | Štefan Beňuš, Marianne Pouplier
|
| |
| 13:50 | Coarticulation across prosodic domains in Italian: An ultrasound investigation
| | Barbara Gili Fivela, Antonio Stella, Sonia D\'Apolito, Francesco Sigona
|
| |
| 14:10 | Investigating the stability of intergestural timing relations
| | Juraj Simko, Fred Cummins, Štefan Beňuš
|
| |
| 14:30 | Speech timing organization for the phonological length contrast in Italian consonants
| | Claudio Zmarich, Barbara Gili Fivela, Pascal Perrier, Christophe Savariaux, Graziano Tisato
|
| |
| 14:50 | Timing in Italian VNC sequences at different speech rates
| | Chiara Celata, Silvia Calamai
|
| |
| 15:10 | Automatic Analysis of Singleton and Geminate Consonant Articulation Using Real-time Magnetic Resonance Imaging
| | Christina Hagedorn, Michael Proctor, Louis Goldstein
|
| |
Mon-Ses2-O4: ASR - Acoustic Models II
Time: Monday 13:30 Place: Michelangelo - Pala Affari - 2nd Floor Type: Oral Chair: Frank Seide
| 13:30 | Conversational Speech Transcription Using Context-Dependent Deep Neural Networks
| | Frank Seide, Gang Li, Dong Yu
|
| |
| 13:50 | Sequential Classification Criteria for NNs in Automatic Speech Recognition
| | Guangsen Wang, Khe Chai Sim
|
| |
| 14:10 | GRAPHEME-BASED AUTOMATIC SPEECH RECOGNITION USING KL-HMM
| | Mathew Magimai.-Doss, Ramya Rasipuram, Guillermo Aradilla, Herve Bourlard
|
| |
| 14:30 | Direct Error Rate Minimization of Hidden Markov Models
| | Joseph Keshet, Chih-Chieh Cheng, Mark Stoehr, David McAllester
|
| |
| 14:50 | On the Effectiveness of Statistical Modeling based Template Matching Approach for Continuous Speech Recognition
| | Xie Sun, Xin Chen, Yunxin Zhao
|
| |
| 15:10 | Comparison of Smoothing Techniques for Robust Context Dependent Acoustic Modelling in Hybrid NN/HMM Systems
| | Guangsen Wang, Khe Chai Sim
|
| |
Mon-Ses2-O5: Robust Speech Recognition II
Time: Monday 13:30 Place: Raffaello - Pala Affari - 3rd Floor Type: Oral Chair: Maurizio Omologo
| 13:30 | Propagation of Uncertainty through Multilayer Perceptrons for Robust Automatic Speech Recognition
| | Ramón Fernandez Astudillo, Joao Paulo da Silva Neto
|
| |
| 13:50 | Mapping Sparse Representation to State Likelihoods in Noise-Robust Automatic Speech Recognition
| | Katariina Mahkonen, Antti Hurmalainen, Tuomas Virtanen, Jort Gemmeke
|
| |
| 14:10 | Uncertainty measures for improving exemplar-based source separation
| | Heikki Kallasjoki, Ulpu Remes, Jort F. Gemmeke, Tuomas Virtanen, Kalle J. Palomäki
|
| |
| 14:30 | Maximum Confidence Measure Based Interaural Phase Difference Estimation for Noise Masking in Dual-Microphone Robust Speech Recognition
| | Hsien-Cheng Liao, Yuan-Fu Liao, Chin-Hui Lee
|
| |
| 14:50 | A Performance Monitoring Approach to Fusing Enhanced Spectrogram Channels in Robust Speech Recognition
| | Shirin Badiezadegan, Richard Rose
|
| |
| 15:10 | Generalized Variable Parameter HMMs for Noise Robust Speech Recognition
| | Ning Cheng, Xunying Liu, Lan Wang
|
| |
Mon-Ses2-P1: Source Separation and Speech Enhancement
Time: Monday 13:30 Place: Valfonda 1 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Marco Matassoni
| #1 | Monaural Voiced Speech Segregation Based on Pitch and Comb Filter
| | Xueliang Zhang, Wenju Liu
|
| |
| #2 | Fast and simple iterative algorithm of Lp-norm minimization for under-determined speech separation
| | Yasuharu Hirasawa, Naoki Yasuraoka, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno
|
| |
| #3 | Monaural Speech Separation Based on a 2D Processing and Harmonic Analysis
| | Azam Rabiee, Saeed Setayeshi, Soo-Young Lee
|
| |
| #4 | Underdetermined Blind Source Separation with Fuzzy Clustering for Arbitrarily Arranged Sensors
| | Ingrid Jafari, Serajul Haque, Roberto Togneri, Sven Nordholm
|
| |
| #5 | On Initial Seed Selection for Frequency Domain Blind Speech Separation
| | Dang Hai Tran Vu, Reinhold Haeb-Umbach
|
| |
| #6 | Spatial filter calibration based on minimization of modified LSD
| | Nobuaki Tanaka, Tetsuji Ogawa, Tetsunori Kobayashi
|
| |
| #7 | Probabilistic Spectrum Envelope: Categorized Audio-features Representation for NMF-based Sound Decomposition
| | Toru Nakashika, Tetsuya Takiguchi, Yasuo Ariki
|
| |
| #8 | A high resolution multiple source localization by generalized cumulant structure (GCS) matrix
| | Jinho Choi, Chang D. Yoo
|
| |
| #9 | Single channel speech music separation using nonnegative matrix factorization with sliding window and spectral masks
| | Emad M. Grais, Hakan Erdogan
|
| |
| #10 | Perceptually-inspired Processing for Multichannel Wiener Filter
| | Jorge I. Marin, David V. Anderson
|
| |
| #11 | Speech recognition in mixed sound of speech and music based on vector quantization and non-negative matrix factorization
| | Shoichi Nakano, Kazumasa Yamamoto, Seiichi Nakagawa
|
| |
| #12 | Reduction of Highly Nonstationary Ambient Noise by Integrating Spectral and Locational Characteristics of Speech and Noise
| | Tomohiro Nakatani, Shoko Araki, Marc Delcroix, Takuya Yoshioka, Masakiyo Fujimoto
|
| |
| #13 | Voice processing by dynamic glottal models with applications to speech enhancement
| | Carlo Drioli, Andrea Calanca
|
| |
| #14 | Supervised Sparse Coding Strategy in Cochlear Implants
| | Jinqiu Sang, Guoping Li, Hongmei Hu, Mark E Lutman, Stefan Bleeck
|
| |
Mon-Ses2-P2: HMM-based Speech Synthesis II
Time: Monday 13:30 Place: Valfonda 2 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Tomoki Toda
| #1 | Continuous Control of the Degree of Articulation in HMM-based Speech Synthesis
| | Benjamin Picart, Thomas Drugman, Thierry Dutoit
|
| |
| #2 | Estimation of Window Coefficients for Dynamic Feature Extraction for HMM based Speech Synthesis
| | Ling-Hui Chen, Yoshihiko Nankaku, Heiga Zen, Keiichi Tokuda, Zhen-Hua Ling, Li-Rong Dai
|
| |
| #3 | Inverse Filtering Based Harmonic plus Noise Excitation Model for HMM-based Speech Synthesis
| | Zhengqi Wen, Jianhua Tao
|
| |
| #4 | Improved HNM-based Vocoder for Statistical Synthesizers
| | Daniel Erro, Iñaki Sainz, Eva Navas, Inma Hernaez
|
| |
| #5 | A Statistical Phrase/Accent Model for Intonation Modeling
| | Gopala Krishna Anumanchipalli, Luís C. Oliveira, Alan W Black
|
| |
| #6 | Intermediate-State HMMs to Capture Continuously-Changing Signal Features
| | Gustav Eje Henter, W. Bastiaan Kleijn
|
| |
| #7 | Automatic sentence selection from speech corpora including diverse speech for
improved HMM-TTS synthesis quality
| | Norbert Braunschweiler, Sabine Buchholz
|
| |
| #8 | Phonological Knowledge Guided HMM State Mapping for Cross-Lingual Speaker Adaptation
| | Hui Liang, John Dines
|
| |
| #9 | Reformulating Prosodic Break Model into Segmental HMMs and Information Fusion
| | Nicolas Obin, Pierre Lanchantin, Anne Lacheret, Xavier Rodet
|
| |
| #9 | Multipulse Sequences for Residual Signal Modeling
| | Ranniery Maia, Heiga Zen, Kate Knill, Mark Gales, Sabine Buchholz
|
| |
| #10 | Can Objective Measures Predict the Intelligibility of Modified HMM-based Synthetic Speech in Noise?
| | Cassia Valentini-Botinhao, Junichi Yamagishi, Simon King
|
| |
| #11 | Speech Synthesis based on Articulatory-Movement HMMs with Voice-source Codebook
| | Tsuneo Nitta, Takayuki Onoda, Masashi Kimura, Yurie Iribe, Kouichi Katsurada
|
| |
| #12 | Large-scale Subjective Evaluations of Speech Rate Control Methods for HMM-based Speech Synthesizers
| | Tsuneo Kato, Makoto Yamada, Nobuyuki Nishizawa, Keiichiro Oura, Keiichi Tokuda
|
| |
| #13 | HMM-Based Emphatic Speech Synthesis Using Unsupervised Context Labeling
| | Yu Maeno, Takashi Nose, Takao Kobayashi, Yusuke Ijima, Hideharu Nakajima, Hideyuki Mizuno, Osamu Yoshioka
|
| |
Mon-Ses2-P3: Phonetics and Phonology, Stress, Accent, Rhythm
Time: Monday 13:30 Place: Faenza 1 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Bernd Möbius
| #1 | Chinese and Italian Speech Rhythm. Normalization and the CCI Algorithm.
| | Chiara Bertini, Pier Marco Bertinetto, Na Zhi
|
| |
| #2 | Rhythm metrics on syllables and feet do not work as expected
| | Paolo Mairano, Antonio Romano
|
| |
| #3 | Applying Rhythm Features to Automatically Assess Non-Native Speech
| | Lei Chen, Klaus Zechner
|
| |
| #4 | Prosodic Synchrony in Co-operative Task-based Dialogues: A Measure of Agreement and Disagreement
| | Brian Vaughan
|
| |
| #5 | Low and High, Short and Long by Crook or by Hook?
| | Oliver Niebuhr, Astrid Wolf
|
| |
| #6 | Estimating Speaking Rate by Means of Rhythmicity Parameters
| | Christian Heinrich, Florian Schiel
|
| |
| #7 | Comparing word and syllable prominence rated by naive listeners
| | Denis Arnold, Bernd Möbius, Petra Wagner
|
| |
| #8 | L1 / L2 perception of lexical stress with F0 peak-delay: effect of an extra syllable added
| | Shinichi Tokuma, Yi Xu
|
| |
| #9 | Letter-to-Phoneme Conversion based on Two-Stage Neural Network focusing on Letter and Phoneme Contexts
| | Seng Kheang, Iribe Yurie, Nitta Tsuneo
|
| |
| #10 | An international English speech corpus for longitudinal study of accent development
| | Rosemary Orr, Hugo Quene, Roeland van Beek, Thari Diefenbach, David van Leeuwen, Marijn Huijbregts
|
| |
| #11 | A CORPUS-BASED STUDY OF ENGLISH PRONUNCIATION VARIATIONS
| | Sunhee Kim, Kyuwhan Lee, Minhwa Chung
|
| |
| #12 | Long term average speech spectra in Yolngu Matha and Pitjantjatjara speaking females and males
| | Hywel Stoakes, Andrew Butcher, Janet Fletcher, Marija Tabain
|
| |
| #13 | Context and speaker dependency in the relation of vowel formants and subglottal resonances – Evidence from Hungarian
| | Tekla Etelka Gráczi, Steven M. Lulich, Tamás Gábor Csapó, András Beke
|
| |
Mon-Ses2-P4: ASR - Search, Keyword Spotting and Confidence Measures I
Time: Monday 13:30 Place: Faenza 2 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Mark Gales
| #1 | Event Selection from Phone Posteriorgrams Using Matched Filters
| | Keith Kintzley, Aren Jansen, Hynek Hermansky
|
| |
| #2 | A Piecewise Aggregate Approximation Lower-Bound Estimate for Posteriorgram-based Dynamic Time Warping
| | Yaodong Zhang, James Glass
|
| |
| #3 | OOV Detection and Recovery using Hybrid Models with Different Fragments
| | Long Qin, Ming Sun, Alexander Rudnicky
|
| |
| #4 | AUC Optimization Based Confidence Measure for Keyword Spotting
| | Haiyang Li, Jiqing Han, Tieran Zheng
|
| |
| #5 | An Empirical Study of Multilingual Spoken Term Detection
| | Zejun Ma, Xiaorui Wang, Bo Xu
|
| |
| #6 | Fusing Multiple Confidence Measures for Chinese Spoken Term Detection
| | Zejun Ma, Xiaorui Wang, Bo Xu
|
| |
| #7 | Response Probability Based Decoding Algorithm for Large Vocabulary Continuous Speech Recognition
| | Zhanlei Yang, Hao Chao, Wenju Liu
|
| |
| #8 | Combining Lattice-Based Language Dependent and Independent Approaches for Out-of-Language Detection in LVCSR
| | Yuxiang Shan, Yan Deng, Jia Liu
|
| |
| #9 | Evaluation of tree-trellis based decoding in over-million LVCSR
| | Naoaki Ito, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
|
| |
| #10 | Lattice Based Discriminative Model Combination Using Automatically Induced Phonetic Contexts
| | Hao Huang, Bing Hu Li
|
| |
| #11 | Predicting Human Perceived Accuracy of ASR Systems
| | Taniya Mishra, Andrej Ljolje, Mazin Gilbert
|
| |
| #12 | Cross-lingual study of ASR errors: on the role of the context in human perception of near homophones
| | Ioana Vasilescu, Dahbia Yahia, Natalie Snoeren, Martine Adda-Decker, Lori Lamel
|
| |
| #13 | Performance Prediction of Speech Recognition Using Average-Voice-Based Speech Synthesis
| | Tatsuhiko Saito, Takashi Nose, Takao Kobayashi, Yohei Okato
|
| |
| #14 | Confidence Measures For Turkish Call Center Conversations
| | Ali Haznedaroglu, Levent M. Arslan
|
| |
| #15 | Spoken Document Confidence Estimation Using Contextual Coherence
| | Taichi Asami, Narichika Nomoto, Satoshi Kobashikawa, Yoshikazu Yamaguchi, Hirokazu Masataki, Satoshi Takahashi
|
| |
Mon-Ses3-O1: Speaker Recognition - Analysis and Statistics II
Time: Monday 16:00 Place: Auditorium - Pala Congressi Type: Oral Chair: Mohammed Senoussaoui
| 16:00 | Intersession compensation and scoring methods in the i-vectors space for speaker recognition
| | Pierre-Michel Bousquet, Driss Matrouf, Jean-François Bonastre
|
| |
| 16:20 | Kernel alignment maximization for speaker recognition based on high-level features
| | Szymon Drgas, Adam Dabrowski
|
| |
| 16:40 | Kernel partial least squares for speaker recognition
| | Balaji Vasan Srinivasan, Daniel Garcia-Romero, Dmitry N. Zotkin, Ramani Duraiswami
|
| |
| 17:00 | Conversational-Side-Specific Inter-Session Variability Compensation
| | Mohamed Omar, Jason Pelecanos
|
| |
| 17:20 | A speaker line-up for the Likelihood Ratio
| | David Van Leeuwen, Niko Brümmer
|
| |
| 17:40 | Towards Fully Bayesian Speaker Recognition: Integrating Out the Between-Speaker Covariance
| | Jesús Antonio Villalba López, Niko Brümmer
|
| |
Mon-Ses3-O3: ASR - Lexical, Prosodic and Multi-Lingual Models
Time: Monday 16:00 Place: Brunelleschi (Green Room) - Pala Congressi - 2nd Floor Type: Oral Chair: Murat Saraclar
| 16:00 | Learning from Mistakes: Expanding Pronunciation Lexicons using Word Recognition Errors
| | Sravana Reddy, Evandro Gouvea
|
| |
| 16:20 | Improving non-native ASR through stochastic multilingual phoneme space transformations
| | David Imseng, Hervé Bourlard, John Dines, Philip N. Garner, Mathew Magimai Doss
|
| |
| 16:40 | Unsupervised Arabic Dialect Adaptation with Self-Training
| | Scott Novotney, Rich Schwartz, Sanjeev Khudanpur
|
| |
| 17:00 | Template-based Automatic Speech Recognition meets Prosody
| | Dino Seppi, Kris Demuynck, Dirk Van Compernolle
|
| |
| 17:20 | Pronunciation Learning from Continuous Speech
| | Ibrahim Badr, Ian McGraw, James Glass
|
| |
| 17:40 | State-Level Data Borrowing for Low-Resource Speech Recognition based on Subspace GMMs
| | Yanmin Qian, Daniel Povey, Jia Liu
|
| |
Mon-Ses3-P5: Speech Synthesis - Selected Topics
Time: Monday 16:00 Place: Donatello (Room Onice) - Pala Congressi - Ground Floor Type: Poster Chair: Enrico Zovato
| #1 | A Grammar Based Approach to Style Specific Phrase Prediction
| | Alok Parlikar, Alan W Black
|
| |
| #2 | Unsupervised features from text for speech synthesis in a speech-to-speech translation system
| | Oliver Watts, Bowen Zhou
|
| |
| #3 | Unsupervised continuous-valued word features for phrase-break prediction without a part-of-speech tagger
| | Oliver Watts, Junichi Yamagishi, Simon King
|
| |
| #4 | Albayzín 2010: a Spanish text to speech evaluation
| | Francisco Campillo, Francisco Méndez, Montserrat Arza, Laura Docío, Antonio Bonafonte, Eva Navas, Iñaki Sainz
|
| |
| #5 | Combining Active and Semi-supervised Learning for Homograph Disambiguation in Mandarin Text-to-Speech Synthesis
| | Binbin Shen, Zhiyong Wu, Yongxin Wang, Lianhong Cai
|
| |
| #6 | Automatically Creating a Diphone Set from a Speech Database
| | Thomas Ewender, Beat Pfister
|
| |
| #7 | Automatic Viseme Clustering for Audiovisual Speech Synthesis
| | Wesley Mattheyses, Lukas Latacz, Werner Verhelst
|
| |
| #8 | Perceptual Quality Dimensions of Text-to-Speech Systems
| | Florian Hinterleitner, Sebastian Möller, Christoph Norrenbrock, Ulrich Heute
|
| |
| #10 | A Pointwise Approach to Pronunciation Estimation for a TTS Front-end
| | Shinsuke Mori, Graham Neubig
|
| |
| #11 | Correlating Text with Prosody
| | Mohamed Abou-Zleikha, Julie Carson-Berndsen
|
| |
| #12 | ``What is... Dengue Fever?\'\' Modeling and Predicting Pronunciation Errors in a Text-to-Speech System
| | Andrew Rosenberg, Raul Fernandez, Bhuvana Ramabhadran
|
| |
| #13 | Aperiodicity Analysis for Quality Estimation of Text-To-Speech Signals
| | Christoph Norrenbrock, Ulrich Heute, Florian Hinterleitner, Sebastian Möller
|
| |
Mon-Ses3-O2: Physiology and Pathology of Spoken Language
Time: Monday 16:00 Place: Leonardo - Pala Affari - Ground Floor Type: Oral Chair: Tim Bunnell
| 16:00 | Novel VTEO Based Mel Cepstral Features for Classification of Normal and Pathological Voices
| | Hemant Patil, Pallavi Baljekar
|
| |
| 16:20 | Temporal Performance of Dysarthric Patients in Speech and Tapping Tasks
| | Eiji Shimura, Kazuhiko Kakehi
|
| |
| 16:40 | A comparative acoustic study on speech of glossectomy patients and normal subjects
| | Xinhui Zhou, Maureen Stone, Carol Espy-Wilson
|
| |
| 17:00 | Dysperiodicity analysis of perceptually assessed synthetic stimuli
| | Ali Alpan, Francis Grenez, Jean Schoentgen
|
| |
| 17:20 | Is the perception of voice quality language-dependant? A comparison of French and Italian listeners and dysphonic speakers
| | Alain Ghio, Frédérique Weisz, Giovanna Baracca, Giovanna Cantarella, Danièle Robert, Virginie Woisard, Franco Fussi, Antoine Giovanni
|
| |
| 17:40 | Automatic Selection of Acoustic and Non-linear Dynamic Features in Voice Signals for Hypernasality Detection
| | Juan Rafael Orozco, Santiago Murillo, Andres Marino Alvarez, Julian David Arias, Edilson Delgado, Jesus Francisco Vargas, Cesar German Castellanos
|
| |
Mon-Ses3-O4: Source Separation
Time: Monday 16:00 Place: Michelangelo - Pala Affari - 2nd Floor Type: Oral Chair: Tomohiro Nakatani
| 16:00 | FREQUENCY ORIENTED PCA FOR BLIND SPEECH SEPARATION OF CONVOLUTIVE MIXTURES IN MULTIPLE ENVIRONMENTS
| | Yasmina Benabderrahmane, Sid Ahmed Selouani, Douglas O\'Shaughnessy
|
| |
| 16:20 | Blind Speech Separation in Time-Domain Using Block-Toeplitz Structure of Reconstructed Signal Matrices
| | Zbynek Koldovsky, Petr Tichavsky, Jiri Malek
|
| |
| 16:40 | Generalized method for solving the permutation problem in frequency-domain blind source separation of convolved speech signals
| | Auxiliadora Sarmiento, Iván Durán, Sergio Cruces, Pablo Aguilera
|
| |
| 17:00 | Adaptation of speaker-specific bases in non-negative matrix factorization for single channel speech-music separation
| | Emad M. Grais, Hakan Erdogan
|
| |
| 17:20 | An Informed Source Separation System for Speech Signals
| | Shuhua Zhang, Laurent Girin
|
| |
| 17:40 | Adaptive Blocking Beamforming for Speech Separation
| | Ngoc Thuy Tran, William Cowley, Andre Pollok
|
| |
Mon-Ses3-O5: Multimodal Signal Processing
Time: Monday 16:00 Place: Raffaello - Pala Affari - 3rd Floor Type: Oral Chair: Keikichi Hirose
| 16:00 | Asynchronous Multimodal Text Entry using Speech and Gesture Keyboards
| | Per Ola Kristensson, Keith Vertanen
|
| |
| 16:20 | ROBUST BIMODAL PERSON IDENTIFICATION USING FACE AND SPEECH WITH LIMITED TRAINING DATA AND CORRUPTION OF BOTH MODALITIES
| | Niall McLaughlin, Ji Ming, Danny Crookes
|
| |
| 16:40 | Toward a multi-speaker visual articulatory feedback system
| | Atef Ben Youssef, Thomas Hueber, Pierre Badin, Gérard Bailly
|
| |
| 17:00 | Statistical Mapping between Articulatory and Acoustic Data for an Ultrasound-based Silent Speech Interface
| | Thomas Hueber, Elie-Laurent Benaroya, Bruce Denby, Gérard Chollet
|
| |
| 17:20 | Unsupervised geometry calibration of acoustic sensor networks using source correspondences
| | Joerg Schmalenstroeer, Florian Jacob, Reinhold Haeb-Umbach, Marius H. Hennecke, Gernot A. Fink
|
| |
| 17:40 | Investigations on Speaking Mode Discrepancies in EMG-based Speech Recognition
| | Michael Wand, Matthias Janke, Tanja Schultz
|
| |
Mon-Ses3-P1: Pitch Processing - Singing Voice Analysis
Time: Monday 16:00 Place: Valfonda 1 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Thomas Drugman
| #1 | Fundamental Frequency Estimation Using Modified Higher Order Moments And Multiple Windows
| | Alipah Pawi, Saeed Vaseghi, Ben Milner, Seyed Ghorshi
|
| |
| #2 | EM-based Gain Adaptation for Probabilistic Multipitch Tracking
| | Michael Wohlmayr, Franz Pernkopf
|
| |
| #3 | Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics
| | Thomas Drugman, Abeer Alwan
|
| |
| #4 | Epoch Extraction in High Pass Filtered Speech using Hilbert Envelope
| | Govind D, Prasanna S R Mahadeva, Debadatta Pati
|
| |
| #5 | Robust HNR-based Closed-loop Pitch and Harmonic Parameters Estimation
| | Alexander Pavlovets, Alexander Petrovsky
|
| |
| #6 | Exploring Bessel Features for Detection of Glottal Closure Instants
| | Chetana Prakash, Dhananjaya Nagaraje Gowda, Suryakanth V. Gangashetty
|
| |
| #7 | Evaluation of Glottal Epoch Detection Algorithms on Different Voice Types
| | Joao Paulo Cabral, John Kane, Christer Gobl, Julie Carson-Berndsen
|
| |
| #8 | A divide et impera algorithm for optimal pitch stylization
| | Antonio Origlia, Giovanni Abete, Francesco Cutugno, Iolanda Alfano, Renata Savy, Bogdan Ludusan
|
| |
| #9 | Singing Voice Analysis Using Relative Harmonic Delays
| | Ricardo Sousa, Aníbal Ferreira
|
| |
| #10 | Singing voice synthesis: Singer-dependent vibrato modeling and coherent processing of spectral envelope
| | Siu Wa Lee, Minghui Dong
|
| |
| #11 | Chorus Digitalis: experiments in chironomic choir singing
| | Sylvain Le Beux, Lionel Feugère, Christophe d\'Alessandro
|
| |
Mon-Ses3-P2: Prosodic Modeling
Time: Monday 16:00 Place: Valfonda 2 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Hiroya Fujisaki
| #1 | Prominence Model for Prosodic Features in Automatic Lexical Stress and Pitch Accent Detection
| | Kun Li, Shuang Zhang, Mingxing Li, Wai-Kit Lo, Helen Meng
|
| |
| #2 | Hierarchical Stress Modeling in Mandarin Text-to-Speech
| | Ya Li, Jianhua Tao, Xiaoying XU
|
| |
| #3 | Automatic Prosodic Events Detection by Using Syllable-based Acoustic, Lexical and Syntactic Features
| | Chong-Jia Ni, Wen-Ju Liu, Bo Xu
|
| |
| #4 | Using Dynamic Time Warping to compute prosodic similarity measures
| | Albert Rilliard, Alexandre Allauzen, Philippe Boula de Mareüil
|
| |
| #5 | Applying the quantitative target approximation model (qTA) to German and Brazilian Portuguese
| | Plinio Barbosa, Hansjörg Mixdorff, Sandra Madureira
|
| |
| #6 | Stylization and Trajectory Modelling of Short and Long Term Speech Prosody Variations
| | Nicolas Obin, Anne Lacheret, Xavier Rodet
|
| |
| #7 | Toward a Continuous Modeling of French Prosodic Structure: Using Acoustic Features to Predict Prominence Location and Prominence Degree
| | Mathieu Avanzi, Nicolas Obin, Anne Lacheret, Bernard Victorri
|
| |
| #8 | Optimal models of prosodic prominence using the Bayesian information criterion
| | Tim Mahrt, Jui-Ting Huang, Yoonsook Mo, Margaret Fleck, Mark Hasegawa-Johnson, Jennifer Cole
|
| |
| #9 | Quantitative Analysis of Tone Coarticulation in Mandarin
| | Hussein Hussein, Hansjörg Mixdorff
|
| |
| #10 | Tracking pitch contours using minimum jerk trajectories
| | Daniel Neiberg, G Ananthakrishnan, Joakim Gustafson
|
| |
Mon-Ses3-P3: Discourse and Dialogue
Time: Monday 16:00 Place: Faenza 1 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Patrick Ehlen
| #1 | On the use of linguistic features in an automatic system for speech analytics of telephone conversations
| | Benjamin Maza, Marc El-Beze, Georges Linares, Renato De Mori
|
| |
| #2 | Determining What Questions To Ask, with the Help of Spectral Graph Theory
| | Abe Kazemzadeh, Sungbok Lee, Panayiotis Georgiou, Shrikanth Narayanan
|
| |
| #3 | \'Are you sure you\'re paying attention?\' -- \'Uh-huh\'. Communicating understanding as a marker of attentiveness
| | Hendrik Buschmeier, Zofia Malisz, Marcin Wlodarczak, Stefan Kopp, Petra Wagner
|
| |
| #4 | Projectability of Transition-relevance Places using Prosodic Features in Japanese Spontaneous Conversation
| | Yuichi Ishimoto, Mika Enomoto, Hitoshi Iida
|
| |
| #5 | Measuring Final Lengthening for Speaker-Change Prediction
| | Anna Hjalmarsson, Kornel Laskowski
|
| |
| #6 | Incremental Learning and Forgetting in Stochastic Turn-Taking Models
| | Kornel Laskowski, Jens Edlund, Mattias Heldner
|
| |
| #7 | Reinforcement Learning of Argumentation Dialogue Policies in Negotiation
| | Kallirroi Georgila, David Traum
|
| |
| #8 | Topic Switching Strategies for Spoken Dialogue Systems
| | Tobias Heinroth, Savina Koleva, Wolfgang Minker
|
| |
| #9 | Unsupervised Clustering of Utterances using Non-parametric Bayesian Methods
| | Ryuichiro Higashinaka, Noriaki Kawamae, Kugatsu Sadamitsu, Yasuhiro Minami, Toyomi Meguro, Kohji Dohsaka, Hirohito Inagaki
|
| |
Mon-Ses3-P4: SLP for Speech Translation, Information Extraction and Retrieval
Time: Monday 16:00 Place: Faenza 2 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Dekai Wu
| #1 | OOV Sensitive Named-Entity Recognition in Speech
| | Carolina Parada, Frederick Jelinek
|
| |
| #2 | Speech Translation with Grammar Driven Probabilistic Phrasal Bilexica Extraction
| | Markus Saers, Dekai Wu, Chi-Kiu Lo, Karteek Addanki
|
| |
| #3 | An Efficient Unified Extraction Algorithm for Bilingual Data
| | Christoph Tillmann, Sanjika Hewavitharana
|
| |
| #4 | Using Features from Topic Models to Alleviate Over-generation in Hierarchical Phrase-based Translation
| | Songfang Huang, Bowen Zhou
|
| |
| #5 | An Empirical Study on Improving Hierarchical Phrase-based Translation Using Alignment Features
| | Songfang Huang, Bowen Zhou
|
| |
| #6 | Robust Speech Translation by Domain Adaptation
| | Xiaodong He, Li Deng
|
| |
| #7 | Enhancements to the Training Process of Classifier-based Speech Translator via Topic Modeling
| | Emil Ettelaie, Panayiotis G. Georgiou, Shrikanth S. Narayanan
|
| |
| #8 | A scalable approach for building a parallel corpus from the Web
| | Vivek Kumar Rangarajan Sridhar, Luciano Barbosa, Srinivas Bangalore
|
| |
| #9 | Spoken Term Detection Results using Plural Subword Models by Estimating Detection Performance for Each Query
| | Yoshiaki Itoh, Kohei Iwata, Ishigame Masaaki, Kazuyo Tanaka, Shi-wook Lee
|
| |
| #10 | SpeechForms - From Web to Speech and Back
| | Luciano Barbosa, Diamantino Caseiro, Giuseppe Di Fabbrizio, Amanda Stent
|
| |
| #11 | Image Processing Filters for Line Detection-based Spoken Term Detection
| | Kazuyuki Noritake, Hiroaki Nanjo, Takehiko Yoshimi
|
| |
| #12 | Using Latent Topic Features for Named Entity Extraction in Search Queries
| | Joseph Polifroni, Francois Mairesse
|
| |
| #13 | Language model expansion using webdata for spoken document retrieval
| | Ryo Masumura, Seongjun Hahm, Akinori Ito, Akinori Ito
|
| |
| #14 | Effects of Query Expansion for Spoken Document Passage Retrieval
| | Tomoyosi Akiba, Koichiro Honda
|
| |
| #15 | Unsupervised Hidden Markov Modeling of Spoken Queries for Spoken Term Detection without Speech Recognition
| | Chun-an Chan, Lin-shan Lee
|
| |
| #16 | Topic Identification from Audio Recordings using Rich Recognition Results and Neural Network based Classifiers
| | Roberto Gemello, Franco Mana, Pier Domenico Batzu
|
| |
Tue-Ses1-O1: ASR - language models II
Time: Tuesday 10:00 Place: Auditorium - Pala Congressi Type: Oral Chair: Stephan Kanthak
| 10:00 | Empirical Evaluation and Combination of Advanced Language Modeling Techniques
| | Tomas Mikolov, Anoop Deoras, Stefan Kombrink, Lukas Burget, Jan Cernocky
|
| |
| 10:20 | Personalizing Model M for Voice-search
| | Geoffrey Zweig, Shuangyu Chang
|
| |
| 10:40 | Sentence Selection by Direct Likelihood Maximization for Language Model Adaptation
| | Takahiro Shinozaki, Yu Kubota, Sadaoki Furui, Eiji Utsunomiya, Yasutaka Shindoh
|
| |
| 11:00 | Feature Combination Approaches for Discriminative Language Models
| | Ebru Arisoy, Bhuvana Ramabhadran, Hong-Kwang Jeff Kuo
|
| |
| 11:20 | On-line Language Model Biasing for Multi-Pass Automatic Speech Recognition
| | Sankaranarayanan Ananthakrishnan, Stavros Tsakalidis, Rohit Prasad, Prem Natarajan
|
| |
| 11:40 | Mandarin word-character hybrid-input Neural Network Language Model
| | Moonyoung Kang, Tim Ng, Long Nguyen
|
| |
Tue-Ses1-O3: Voice Conversion
Time: Tuesday 10:00 Place: Brunelleschi (Green Room) - Pala Congressi - 2nd Floor Type: Oral Chair: Junichi Yamagishi
| 10:00 | One-to-Many Voice Conversion Based on Tensor Representation of Speaker Space
| | Daisuke Saito, Keisuke Yamamoto, Nobuaki Minematsu, Keikichi Hirose
|
| |
| 10:20 | A Study on Bag of Gaussian Model with Application to Voice Conversion
| | Yu Qiao, Tong Tong, Nobuaki Minematsu
|
| |
| 10:40 | A Bayesian Approach to Voice Conversion Based on GMMs Using Multiple Model Structures
| | Lei Li, Yoshihiko Nankaku, Keiichi Tokuda
|
| |
| 11:00 | Quality Improvement of Voice Conversion Systems Based on Trellis Structured VQ
| | Mahdi Eslami, Hamid Sheikhzadeh, Abolghasem Sayadiyan
|
| |
| 11:20 | Voice Conversion using GMM with Enhanced Global Variance
| | Hadas Benisty, David Malah
|
| |
| 11:40 | Spectral Envelope Transformation using DFW and Amplitude Scaling for Voice Conversion with Parallel or Nonparallel Corpora
| | Elizabeth Godoy, Olivier Rosec, Thierry Chonavel
|
| |
Tue-Ses1-P5: Speech Audio Analysis and Classification
Time: Tuesday 10:00 Place: Donatello (Room Onice) - Pala Congressi - Ground Floor Type: Poster Chair: Olivier Rosec
| #1 | Stop Consonant Recognition by Temporal Fine Structure of Burst
| | Seppo Fagerlund, Unto K. Laine
|
| |
| #2 | Phonetic Classification Using Controlled Random Walks
| | Katrin Kirchhoff, Andrei Alexandrescu
|
| |
| #3 | Keyphrase Cloud Generation of Broadcast News
| | Luís Marujo, Márcio Viveiros, João P. Neto
|
| |
| #4 | Optimized Feature Extraction and HMMs in Subword Detectors
| | Alfonso M. Canterla, Magne H. Johnsen
|
| |
| #5 | Real-World Speech/Non-Speech Audio Classification Based on Sparse Representation Features and GPCs
| | Ziqiang Shi, Jiqing Han, Tieran Zheng
|
| |
| #6 | Privacy Preserving Speaker Verification using Adapted GMMs
| | Manas Pathak, Bhiksha Raj
|
| |
| #7 | Clustering Expressive Speech Styles in Audiobooks Using Glottal Source Parameters
| | Eva Szekely, Joao Cabral, Peter Cahill, Julie Carson-Berndsen
|
| |
| #8 | On the use of the rhythmogram for automatic syllabic prominence detection
| | Bogdan Ludusan, Antonio Origlia, Francesco Cutugno
|
| |
| #9 | Speech Modulation Features for Robust Nonnative Speech Accent Detection
| | Sethserey Sam, Xiong Xiao, Laurent Besacier, Eric Castelli, Haizhou Li, Eng Siong Chng
|
| |
| #10 | Frame-Level Vocal Effort Likelihood Space Modeling for Improved Whisper-Island Detection
| | Chi Zhang, John Hansen
|
| |
| #11 | Speaker Identification for Whispered Speech Using A Training Feature Transformation From Neutral To Whisper
| | Xing Fan, John Hansen
|
| |
| #12 | An Accurate and Robust Gender Identification Algorithm
| | Andrea DeMarco, Stephen J. Cox
|
| |
| #13 | Deep Belief Networks for Automatic Music Genre Classification
| | Xiaohong Yang, Qingcai Chen, Shusen Zhou, Xiaolong Wang
|
| |
| #14 | Image Representation of the Subband Power Distribution for Robust Sound Classification
| | Jonathan William Dennis, Huy Dat Tran, Haizhou Li
|
| |
| #15 | Acoustic and Visual Cues of Turn-Taking Dynamics in Dyadic Interactions
| | Bo Xiao, Viktor Rozgic, Athanasios Katsamanis, Brian Baucom, Panayiotis Georgiou, Shrikanth Narayanan
|
| |
Tue-Ses1-O2: Phonology and Phonetics
Time: Tuesday 10:00 Place: Leonardo - Pala Affari - Ground Floor Type: Oral Chair: Mark Hasegawa-Johnson
| 10:00 | Laryngealization and Breathiness in Persian
| | Vahid Sadeghi
|
| |
| 10:20 | Age-dependent differences in the neutralization of the intervocalic voicing contrast: Evidence from an apparent-time study on East Franconian
| | Viola Müller, Jonathan Harrington, Felicitas Kleber, Ulrich Reubold
|
| |
| 10:40 | Comparing syllable frequencies in corpora of written and spoken language
| | Barbara Samlowski, Bernd Möbius, Petra Wagner
|
| |
| 11:00 | Sylli: Automatic Phonological Syllabification for Italian
| | Iacoponi Luca, Savy Renata
|
| |
| 11:20 | A preliminary study on the production of signs in Brazilian Sign Language when one of the manual articulators is unavailable
| | André Nogueira Xavier, Plinio Almeida Barbosa
|
| |
| 11:40 | Electroglottograph and Acoustic Cues for Phonation Contrasts in Taiwan Min Falling Tones
| | Ho-hsien Pan, Mao-hsu Chen, Shao-ren Lyu
|
| |
Tue-Ses1-O4: Robust Speech Recognition III
Time: Tuesday 10:00 Place: Michelangelo - Pala Affari - 2nd Floor Type: Oral Chair: Richard Stern
| 10:00 | Sinusoidal Approach for the Single-Channel Speech Separation and Recognition Challenge
| | Pejman Mowlaee, Rahim Saeidi, Zheng-Hua Tan, Mads Græsbøll Christensen, Tomi Kinnunen, Søren Holdt Jensen, Pasi Fr¨anti
|
| |
| 10:20 | Semi-supervised Single-Channel Speech-Music Separation for Automatic Speech Recognition
| | cemil demir, murat saraçlar, ali taylan cemgil
|
| |
| 10:40 | A Level-dependent Auditory Filter-bank for Speech Recognition in Reverberant Environments
| | HariKrishna Maganti, Marco Matassoni
|
| |
| 11:00 | A Multichannel Feature-Based Processing for Robust Speech Recognition
| | Mehrez Souden, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani
|
| |
| 11:20 | Feature Normalization Using Structured Full Transforms for Robust Speech Recognition
| | Xiong Xiao, Jinyu Li, Eng Siong Chng, Haizhou Li
|
| |
| 11:40 | A Robust Estimation Method of Noise Mixture Model for Noise Suppression
| | Masakiyo Fujimoto, Shinji Watanabe, Tomohiro Nakatani
|
| |
Tue-Ses1-O5: Spoken Language Understanding
Time: Tuesday 10:00 Place: Raffaello - Pala Affari - 3rd Floor Type: Oral Chair: Ruhi Sarikaya
| 10:00 | Multi-Task Learning for Spoken Language Understanding with Shared Slots
| | Xiao Li, Ye-Yi Wang, Gokhan Tur
|
| |
| 10:20 | Learning Weighted Entity Lists from Web Click Logs for Spoken Language Understanding
| | Dustin Hillard, Asli Celikyilmaz, Dilek Hakkani-Tur, Gokhan Tur
|
| |
| 10:40 | Bootstrapping Domain Detection Using Query Click Logs for New Domains
| | Dilek Hakkani-Tür, Gokhan Tur, Larry Heck, Elizabeth Shriberg
|
| |
| 11:00 | Multi-Domain Spoken Language Understanding with Approximate Inference
| | Asli Celikyilmaz, Dilek Hakkani-Tur, Gokhan Tur
|
| |
| 11:20 | Speech Indexing Using Semantic Context Inference
| | Chien-Lin Huang, Bin Ma, Haizhou Li, Chung-Hsien Wu
|
| |
| 11:40 | Automatically Optimizing Utterance Classification Performance without Human in the Loop
| | Yun-Cheng Ju, Jasha Droppo
|
| |
Tue-Ses1-P1: Human Speech and Sound Perception I
Time: Tuesday 10:00 Place: Valfonda 1 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Denis Burnham
| #1 | Parallels in infants’ attention to speech articulation and to physical changes in speech-unrelated objects
| | Eeva Klintfors, Ellen Marklund, Francisco Lacerda
|
| |
| #2 | Speech events are recoverable from unlabeled articulatory data: Using an unsupervised clustering approach on data obtained from Electromagnetic Midsaggital Articulography (EMA)
| | Daniel Duran, Jagoda Bruni, Grzegorz Dogil, Hinrich Schütze
|
| |
| #3 | Children’s recognition of their own voice: influence of phonological impairment
| | Sofia Strömbergsson
|
| |
| #4 | Evaluation of Bone-conducted Ultrasonic Hearing-aid Regarding Transmission of Speaker Discrimination Information
| | Takayuki Kagomiya, Seiji Nakagawa
|
| |
| #5 | Impact of Different Feedback Mechanisms in EMG-based Speech Recognition
| | Christian Herff, Matthias Janke, Michael Wand, Tanja Schultz
|
| |
| #6 | Phonotactic constraints and the segmentation of Cantonese speech
| | Michael C. W. Yip
|
| |
| #7 | Reaction time and decision difficulty in the perception of intonation
| | Katrin Schneider, Grzegorz Dogil, Bernd Möbius
|
| |
| #8 | Processing of stress related acoustic cues as indexed by ERPs
| | Ferenc Honbolygó, Valéria Csépe
|
| |
| #9 | On the relationship between perceived accentedness, acoustic similarity, and processing difficulty in foreign-accented speech
| | Marijt J. Witteman, Andrea Weber, James M. McQueen
|
| |
| #10 | Perception Boundary between Single and Geminate Stops in 3- and 4-mora Japanese Words
| | Shigeaki Amano, Yukari Hirata
|
| |
| #11 | Correlation Analysis of Acoustic Features with Perceptual Voice Quality Similarity for Similar Speaker Selection
| | Yusuke Ijima, Mitsuaki Isogai, Hideyuki Mizuno
|
| |
Tue-Ses1-P2: Multilingual and Multimodal Approaches to Spoken Language
Time: Tuesday 10:00 Place: Valfonda 2 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Michael Johnston
| #1 | Can Audio-Visual Speech Recognition outperform Acoustically Enhanced Speech Recognition in Automotive Environment?
| | Navarathna Rajitha, Kleinschmidt Tristan, Dean David, Sridharan Sridha, Lucey Patrick
|
| |
| #2 | A Multimodal Approach to Dictation of Handwritten Historical Documents
| | Vicent Alabau, Verónica Romero, Antonio-L. Lagarda, Carlos-D. Martínez-Hinarejos
|
| |
| #3 | Weight Optimization for Bimodal Unit-Selection Talking Head Synthesis
| | Asterios Toutios, Utpala Musti, Slim Ouni, Vincent Colotte
|
| |
| #4 | Modality Selection and Perceived Mental Effort in a mobile Application
| | Stefan Schaffer, Benjamin Jöckel, Ina Wechsung, Robert Schleicher, Sebastian Möller
|
| |
| #5 | A cross-lingual spoken content search system
| | Jitendra Ajmera, Ashish Verma
|
| |
| #6 | NeMo: a Platform for Multilingual News Monitoring
| | Fabio Brugnara, Daniele Falavigna, Marcello Federico, Christian Girardi, Diego Giuliani, Roberto Gretter
|
| |
| #7 | Unsupervised Learning of Acoustic Unit Descriptors for Audio Content Representation and Classification
| | Sourish Chaudhuri, Mark Harvilla, Bhiksha Raj
|
| |
| #8 | Conditioned Hidden Markov Model Fusion for Multimodal Classification
| | Michael Glodek, Stefan Scherer, Friedhelm Schwenker
|
| |
| #9 | Distant Speech Recognition in a Smart Home: Comparison of Several Multisource ASRs in Realistic Conditions
| | Benjamin Lecouteux, Michel Vacher, François Portet
|
| |
| #10 | A Robust Approach to Mining Repeated Sequence in Audio Stream
| | Jiansong Chen, Lei Zhu, Bailan Feng, Peng Ding, Bo Xu
|
| |
Tue-Ses1-P3: ASR - New Paradigms and Other Topics
Time: Tuesday 10:00 Place: Faenza 1 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Chin Hui Lee
| #1 | Accelerated Parallelizable Neural Network Learning Algorithm for Speech Recognition
| | Dong Yu, Li Deng
|
| |
| #2 | Deep Convex Network: A Scalable Architecture for Deep Learning
| | Li Deng, Dong Yu
|
| |
| #3 | Modeling Broad Context for Tone Recognition with Conditional Random Fields
| | Siwei Wang, Gina-Anne Levow
|
| |
| #4 | Improved Tonal Language Speech Recognition by Integrating Spectro-temporal Evidence and Pitch Information with Properly Chosen Tonal Acoustic Units
| | Shang-wen Li, Yow-bang Wang, Liang-che Sun, Lin-shan Lee
|
| |
| #5 | Kullback-Leibler divergence-based ASR training data selection
| | Evandro Gouvea, Marelie Davel
|
| |
| #6 | Articulatory Feature Classification Using Nearest Neighbors
| | Arild Brandrud Næss, Karen Livescu, Rohit Prabhavalkar
|
| |
| #7 | Continuous episodic memory based speech recognition using articulatory dynamics
| | Sébastien Demange, Slim Ouni
|
| |
| #8 | Graphone Model Interpolation and Arabic Pronunciation Generation
| | T. Li, P. C. Woodland, F. Diehl, M. J. F. Gales
|
| |
| #9 | Grapheme-to-Phoneme Conversion using Conditional Random Fields
| | Irina Illina, Dominique Fohr, Denis Jouvet
|
| |
| #10 | Bilingual Acoustic Model Adaptation by Unit Merging on Different Levels and Cross-level Integration
| | Ching-Feng Yeh, Chao-Yu Huang, Lin-Shan Lee
|
| |
| #11 | A qualitative evaluation of phoneme-to-phoneme technology
| | Marijn Schraagen, Gerrit Bloothooft
|
| |
| #12 | Cheap Bootstrap of Multi-Lingual Hidden Markov Models
| | Daniele Falavigna, Roberto Gretter
|
| |
| #13 | Adaptive Stream Fusion in Multistream Recognition of Speech
| | Nima Mesgarani, Samuel Thomas, Hynek Hermansky
|
| |
| #14 | Unsupervised Audio Patterns Discovery using HMM-based Self-Organized Units
| | Man-hung Siu, Herbert Gish, Steve Lowe, Arthur Chan
|
| |
| #15 | NEAREST NEIGHBORS WITH LEARNED DISTANCES FOR PHONETIC FRAME CLASSIFICATION
| | John Labiak, Karen Livescu
|
| |
Tue-Ses1-P4 : Speaker Recognition - Modeling, Automatic Procedures, Analysis III
Time: Tuesday 10:00 Place: Faenza 2 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Daniel Garcia-Romero
| #1 | i-vector Based Speaker Recognition on Short Utterances
| | Ahilan Kanagasundaram, Robbie Vogt, David Dean, Sridha Sridharan, Michael Mason
|
| |
| #2 | Study of Overlapped Speech Detection for NIST SRE Summed Channel Speaker Recognition
| | Hanwu Sun, Bin Ma
|
| |
| #3 | Super-Dirichlet Mixture Models using Differential Line Spectral Frequences for Text-Independent Speaker Identification
| | Zhanyu Ma, Arne Leijon
|
| |
| #4 | Comparison of Voice Activity Detectors for Interview Speech in NIST Speaker Recognition Evaluation
| | Hon-Bill Yu, Man-Wai Mak
|
| |
| #5 | Eigen-Voice Based Anchor Modeling System for Speaker Identification using MLLR Super-Vector
| | Achintya Kumar Sarkar, S. Umesh
|
| |
| #6 | Automatic Detection of Speaker Attributes Based on Utterance Text
| | Wen Wang, Andreas Kathol, Harry Bratt
|
| |
| #7 | Comparison of Speaker Recognition Approaches for Real Applications
| | Sandro Cumani, Pier Domenico Batzu, Daniele Colibro, Claudio Vair, Pietro Laface, Vasileios Vasilakakis
|
| |
| #8 | Modeling Speaker Personality using Voice
| | Tim Polzehl, Sebastian Möller, Florian Metze
|
| |
| #9 | Structural Joint Factor Analysis for Speaker Recognition
| | Marc Ferras, Koichi Shinoda, Sadaoki Furui
|
| |
| #10 | Acoustic Forest for SMAP-based Speaker Verification
| | Sangeeta Biswas, Marc Ferras, Koichi Shinoda, Sadaoki Furui
|
| |
| #11 | Mixture of Auto-Associative Neural Networks for Speaker Verification
| | Sivaram Garimella, Samuel Thomas, Hynek Hermansky
|
| |
Tue-Ses2-O1: Dialect and Accent Identification
Time: Tuesday 13:30 Place: Auditorium - Pala Congressi Type: Oral Chair: David Martínez
| 13:30 | In search of cues discriminating West-African accents in French
| | Philippe Boula de Mareüil, Jean-Luc Rouas, Manuela Yapomo
|
| |
| 13:50 | Computer and Human Recognition of Regional Accents of British English
| | Abualsoud Hanani, Martin J. Russell, Michael J. Carey
|
| |
| 14:10 | Target-aware Lattice Rescoring for Dialect Recognition
| | Rong Tong, Bin Ma, Haizhou Li, Eng Siong Chng
|
| |
| 14:30 | Effective Arabic Dialect Classification Using Diverse Phonotactic Models
| | Murat Akbacak, Dimitra Vergyri, Andreas Stolcke, Andreas Stolcke, Nicolas Scheffer, Arindam Mandal
|
| |
| 14:50 | Characterizing Deletion Transformations across Dialects using a Sophisticated Tying Mechanism
| | Nancy Chen, Wade Shen, Joe Campbell
|
| |
| 15:10 | Dialect and Accent Recognition using Phonetic-Segmentation Supervectors
| | Fadi Biadsy, Julia Hirschberg, Daniel Ellis
|
| |
Tue-Ses2-O3: ASR - Acoustic Models III
Time: Tuesday 13:30 Place: Brunelleschi (Green Room) - Pala Congressi - 2nd Floor Type: Oral Chair: Ralf Schlueter
| 13:30 | Generalized Baum-Welch Algorithm and Its Implication to a New Extended Baum-Welch Algorithm
| | Roger Hsiao, Tanja Schultz
|
| |
| 13:50 | Word Boundary Modelling and Full Covariance Gaussians for Arabic Speech-to-Text Systems
| | Frank Diehl, Mark Gales, Andrew Liu, Marcus Tomalin, Phil Woodland
|
| |
| 14:10 | A Fully Automated Derivation of State-based Eigentriphones for Triphone Modeling with No Tied States using Regularization
| | Tom Ko, Brian Mak
|
| |
| 14:30 | Reducing Computational Complexities of Exemplar-Based Sparse Representations With Applications to Large Vocabulary Speech Recognition
| | Tara Sainath, Bhuvana Ramabhadran, David Nahamoo, Dimitri Kanevsky
|
| |
| 14:50 | An i-Vector based Approach to Training Data Clustering for Improved Speech Recognition
| | Yu Zhang, Jian Xu, Zhi-Jie Yan, Qiang Huo
|
| |
| 15:10 | Rapid Training of Acoustic Models using Graphics Processing Units
| | Senaka Buthpitiya, Ian Lane, Jike Chong
|
| |
Tue-Ses2-S1: Show & Tell Demonstration - Mobility and Web-services
Time: Tuesday 13:30 Place: Donatello (Room Onice) - Pala Congressi - Ground Floor Type: Poster Chair: Mazin Gilbert
| #1 | Making an automatic speech recognition service freely available on the web
| | Stuart Nicholas Wrigley, Thomas Hain
|
| |
| #2 | AT&T VoiceBuilder: A Cloud-based Text-To-Speech Voice Builder Tool
| | Yeon-Jun Kim, Thomas Okken, Alistair Conkie, Giuseppe Di Fabbrizio
|
| |
| #3 | Extending Audio Notetaker to Browse WebASR Transcriptions
| | Roger Tucker, Dan Fry, Vincent Wan, Stuart Wrigley, Thomas Hain
|
| |
| #4 | A Web-Based Tool for Developing Multilingual Pronunciation Lexicons
| | Samantha Ainsley, Linne Ha, Martin Jansche, Ara Kim, Masayuki Nanzawa
|
| |
| #5 | Speak4it and the Multimodal Semantic Interpretation System
| | Michael Johnston, Patrick Ehlen
|
| |
| #6 | TSAB -- Web Interface for Transcribed Speech Collections
| | Tanel Alumäe, Ahti Kitsik
|
| |
| #7 | Visual Voice Mail to Text on the iPhone/iPad
| | Andrej Ljolje, Vincent Goffin, Diamantino Caseiro, Taniya Mishra, Mazin Gilbert
|
| |
| #8 | Percy - an HTML5 framework for media rich web experiments on mobile devices
| | Christoph Draxler
|
| |
| #9 | The KLAIR toolkit for recording interactive dialogues with a virtual infant
| | Mark Huckvale
|
| |
| #10 | Real-time Prototype for Integration of Blind Source Extraction and Robust Automatic Speech Recognition
| | Francesco Nesta, Marco Matassoni, Hari Krishna Maganti
|
| |
Tue-Ses2-O2: First Language Acquisition
Time: Tuesday 13:30 Place: Leonardo - Pala Affari - Ground Floor Type: Oral Chair: cinzia avesani
| 13:30 | The Multi Timescale Phoneme Acquisition Model of the Self-Organizing Based on the Dynamic Features
| | Kouki Mizawa, Hideaki Miura, Hideaki Kikuchi, Reiko Mazuka
|
| |
| 13:50 | The time-course of talker-specificity effects for newly-learned pseudowords: Evidence for a hybrid model of lexical representation
| | Helen Brown, M. Gareth Gaskell
|
| |
| 14:10 | A parametric approach to intonation acquisition research: Validation on child-directed speech data
| | Britta Lintfert, Antje Schweitzer, Bernd Möbius
|
| |
| 14:30 | Modelling Novelty Preference in Word Learning
| | Maarten Versteegh, Louis ten Bosch, Lou Boves
|
| |
| 14:50 | Using Imitation to learn Infant-Adult Acoustic Mappings
| | G Ananthakrishnan, Giampiero Salvi
|
| |
| 15:10 | Thresholding word activations for response scoring - Modelling psycholinguistic data
| | Christina Bergmann, Louis ten Bosch, Lou Boves
|
| |
Tue-Ses2-O4: Spoken Dialogue Systems I
Time: Tuesday 13:30 Place: Michelangelo - Pala Affari - 2nd Floor Type: Oral Chair: Olivier Pietquin
| 13:30 | User Study of Spoken Decision Support System
| | Teruhisa Misu, Kiyonori Ohtake, Chiori Hori, Hisashi Kawai, Satoshi Nakamura
|
| |
| 13:50 | Efficient Probabilistic Tracking of User Goal and Dialog History for Spoken Dialog Systems
| | Antoine Raux, Yi Ma
|
| |
| 14:10 | Tackling a Shilly-Shally Classifier for Predicting Task Success in Spoken Dialogue Interaction
| | Alexander Schmitt, Alexander Zgorzelski, Wolfgang Minker
|
| |
| 14:30 | Evaluation of Listening-oriented Dialogue Control Rules based on the Analysis of HMMs
| | Toyomi Meguro, Ryuichiro Higashinaka, Yasuhiro Minami, Kohji Dohsaka
|
| |
| 14:50 | Large-Scale Experiments on Data-Driven Design of Commercial Spoken Dialog Systems
| | David Suendermann, Jackson Liscombe, Jonathan Bloom, Grace Li, Roberto Pieraccini
|
| |
| 15:10 | Comparing system-driven and free dialogue in in-vehicle interaction
| | Fredrik Kronlid, Jessica Villing, Alexander Berman, Staffan Larsson
|
| |
Tue-Ses2-O5: Spoken Language Resources, Evaluation and Standardization II
Time: Tuesday 13:30 Place: Raffaello - Pala Affari - 3rd Floor Type: Oral Chair: Paolo Baggia
| 13:30 | Rapid Evaluation of Speech Representations for Spoken Term Discovery
| | Michael Carlin, Samuel Thomas, Aren Jansen, Hynek Hermansky
|
| |
| 13:50 | Phonemic Similarity Metrics to Compare Pronunciation Methods
| | Ben Hixon, Eric Schneider, Susan L. Epstein
|
| |
| 14:10 | Investigating the effect of number of interlocutors on the quality of experience for multi-party audio conferencing
| | Janto Skowronek, Alexander Raake
|
| |
| 14:30 | On Development of Consistently Punctuated Speech Corpora
| | Jachym Kolar, Lori Lamel
|
| |
| 14:50 | A Multimodal Real-Time MRI Articulatory Corpus for Speech Research
| | Shrikanth Narayanan, Erik Bresch, Prasanta Ghosh, Louis Goldstein, Athanasios Katsamanis, Yoon Kim, Adam Lammert, Michael Proctor, Vikram Ramanarayanan, Yinghua Zhu
|
| |
| 15:10 | Building an audio-visual corpus of Australian English: large corpus collection with an economical portable and replicable Black Box
| | Denis Burnham, Dominique Estival, Steven Fazio, Felicity Cox, Robert Dale, Jette Viethen, Steve Cassidy, Julien Epps, Roberto Togneri, Yuko Kinoshita, Roland Göcke, Joanne Arciuli, Marc Onslow, Trent Lewis, Andy Butcher, John Hajek, Michael Wagner
|
| |
Tue-Ses2-S1-O: Spoken Language Processing of Human-Human Conversations I
Time: Tuesday 13:30 Place: Caravaggio (Adua 1) - Pala Affari - 1st Floor Type: Oral Chair: Dilek Hakkani-Tur
| 13:30 | Language-Independent Socio-Emotional Role Recognition in the AMI Meetings Corpus
| | Fabio Valente, Alessandro Vinciarelli
|
| |
| 13:50 | Measuring acoustic-prosodic entrainment with respect to multiple levels and dimensions
| | Rivka Levitan, Julia Hirschberg
|
| |
| 14:10 | Automatic Call Quality Monitoring Using Cost-Sensitive Classification
| | Youngja Park
|
| |
Tue-Ses2-P1: Human Speech and Sound Perception II
Time: Tuesday 13:30 Place: Valfonda 1 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Holger Mitterer.
| #1 | Pointing Gestures do not Influence the Perception of Lexical Stress
| | Alexandra Jesse, Holger Mitterer
|
| |
| #2 | Relationships between Phonetic Features and Speech Perception
| | Ian Cushing, Francis Li, Ken Worrall, Jackson Tim
|
| |
| #3 | The representation of speech in a nonlinear auditory model: time-domain analysis of simulated auditory-nerve firing patterns
| | Guy Brown, Tim Jurgens, Ray Meddis, Matthew Robertson, Nicholas Clark
|
| |
| #4 | An Automatic Voice Pleasantness Classification System based on Prosodic and Acoustic Patterns of Voice Preference
| | Luis Pinto-Coelho, Daniela Braga, Miguel Sales-Dias, Carmen Garcia-Mateo
|
| |
| #5 | Contributions of F1 and F2 (F2’) to the perception of plosive consonants
| | René Carré, Pierre Divenyi, Willy Serniclaes, Emmanuel Ferragne, Egidio Marsico, Viet-Son Nguyen
|
| |
| #6 | Auditory speech processing is affected by visual speech in the periphery
| | Jeesun Kim, Chris Davis
|
| |
| #7 | Visual Speech Speeds Up Auditory Identification Responses
| | Tim Paris, Jeesun Kim, Davis Chris
|
| |
| #8 | Agglomerative Hierarchical Clustering of Emotions in Speech Based on Subjective Relative Similarity
| | Ryoichi Takashima, Tohru Nagano, Ryuki Tachibana, Masafumi Nishimura
|
| |
| #9 | Optimal Syllabic Rates and Processing Units in Perceiving Mandarin Spoken Sentences
| | Guangting Mai, Gang Peng
|
| |
| #10 | Cross-Lingual Speaker Discrimination Using Natural and Synthetic Speech
| | Mirjam Wester, Hui Liang
|
| |
Tue-Ses2-P2: Speech Audio Analysis
Time: Tuesday 13:30 Place: Valfonda 2 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Toshiaki Fukada
| #1 | Robust Audio Fingerprinting Based on Local Spectral Luminance Maxima Scheme
| | Yong-Zhe Shi, Wei-Qiang Zhang, jia Liu
|
| |
| #2 | Entropy Driven Inference of Stochastic Grammars
| | Unto Kalervo Laine
|
| |
| #3 | An Efficient Pre-processing Scheme to Improve the Sound Source Localization System in Noisy Environment
| | Sheng-Chieh Lee, Bo-Wei Chen, Jhing-Fa Wang, Chung-Hsien Wu, Min-Jian Liao
|
| |
| #4 | A study on auditory feature spaces for speech-driven lip animation
| | Guylaine Le-Jan, Yannick Benezeth, Guillaume Gravier, Frédéric Bimbot
|
| |
| #5 | Phase-only Speech Reconstruction Using Very Short Frames
| | Erfan Loweimi, Seyed Mohammad Ahadi, Hamid Sheikhzadeh
|
| |
| #6 | Frequency-Warped and Stabilized Time-Varying Cepstral Coefficients
| | Trond Skogstad, Torbjørn Svendsen
|
| |
| #7 | Using Human Perception for Automatic Accent Assessment
| | Freddy William, Abhijeet Sangwan, John H.L. Hansen
|
| |
| #8 | A study of the effectiveness of articulatory strokes for phonemic recognition
| | Carlos Molina, Sungbok Lee, Shrikanth Narayanan, Néstor Becerra Yoma
|
| |
| #9 | Auditory Filterbank Improves Voice Morphing
| | Erika Okamoto, Toshio Irino, Ryuichi Nisimura, Hideki Kawahara
|
| |
| #10 | Monaural Sound Localization
| | Anna Katharina Fuchs, Christian Feldbauer, Michael Stark
|
| |
Tue-Ses2-P3: Speech Coding
Time: Tuesday 13:30 Place: Faenza 1 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Alan McCree
| #1 | Dual-mode AVQ Coding Based on Spectral Masking and Sparseness Detection for ITU-T G.711.1/G.722 Super-wideband Extensions
| | Masahiro Fukui, Shigeaki Sasaki, Yusuke Hiwasaki, Sachiko Kurihara, Yoichi Haneda
|
| |
| #2 | Phone Impact Based Speech Transmission Technique for Reliable Speech Recognition in Poor Wireless Network Conditions
| | Azar Taufique, Kumaran Vijayasankar, Wooil Kim, John H.L. Hansen, Marco Tacca, Andrea Fumagalli
|
| |
| #3 | Automatic Speech Codec Identification with Applications to Tampering Detection of Speech Recordings
| | Jingting Zhou, Daniel Garcia-Romero, Carol Espy-Wilson
|
| |
| #4 | A hybrid quasi-harmonic/CELP wideband speech coding scheme for unit selection TTS synthesis
| | Chang-Heon Lee, Olivier Rosec, Yannis Stylianou
|
| |
| #5 | Voice Quality Characterization of IETF Opus Codec
| | Anssi Rämö, Henri Toukomaa
|
| |
| #6 | Leja ordering LSFs for accurate estimation of predictor coefficients
| | Christian Fischer Pedersen
|
| |
| #7 | Improved Quality for Conversational VoIP using Path Diversity
| | Qipeng Gong, Peter Kabal
|
| |
| #8 | Tree Encoding for the ITU-T G.711.1 Speech Coder
| | Abdul Hannan Khan, Peter Kabal
|
| |
| #9 | Parallel and Hierarchical Decision Making for Sparse Coding in Speech Recognition
| | Dong Wang, Ravichander Vipperla, Nicholas Evans
|
| |
| #10 | A New Model-based Mandarin-speech Coding System
| | Chen-Yu Chiang, Jyh-Her Yang, Ming-Chieh Liu, Yih-Ru Wang, Yuan-Fu Liao, Sin-Horn Chen
|
| |
Tue-Ses2-P4: Robustness and Adaptation for ASR
Time: Tuesday 13:30 Place: Faenza 2 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Vivek Kumar
| #1 | Using Unsupervised Feature-Based Speaker Adaptation for Improved Transcription of Spoken Archives
| | Petr Cerva, Karel Palecek, Jan Silovsky, Jan Nouza
|
| |
| #2 | Online Speaker Adaptation with Pre-computed FMLLR Transformations
| | Volker Fischer, Siegfried Kunzmann
|
| |
| #3 | Instantaneous Speaker Adaptation through Selection and Combination of fMLLR Transformation Matrices
| | Diego Giuliani, Fabio Brugnara
|
| |
| #4 | Joint Bilinear Transformation Space Based Maximum a Posteriori Linear Regression Adaptation using Prior with Variance Function
| | Hwa Jeon Song, Yunkeun Lee, Hyung Soon Kim
|
| |
| #5 | A Study on Combining VTLN and SAT to Improve the Performance of Automatic Speech Recognition
| | Rama Sanand Doddipatla, Mikko Kurimo
|
| |
| #6 | Incorporating Regional Information to Enhance MAP-based Stochastic Feature Compensation for Robust Speech Recognition
| | Yu Tsao, Paul R. Dixon, Chiori Hori, Hisashi Kawai
|
| |
| #7 | A Study on the Effect of Pitch on LPCC and PLPC Features for Children\'s ASR in comparison to MFCC
| | Shweta Ghai, Rohit Sinha
|
| |
| #8 | About Handling Boundary Uncertainty in a Speaking Rate Dependent Modeling Approach
| | Denis Jouvet, Dominique Fohr, Irina Illina
|
| |
| #9 | An Active Learning Approach to Task Adaptation
| | Ji Wu, Zhiyang He, Ping Lv
|
| |
| #10 | Efficient Speaker and Noise Normalization for Robust Speech Recognition
| | Vikas Joshi, Raghavendra Bilgi, Umesh S, Carmen Benitez, Luz García Martínez
|
| |
| #11 | How Realistic is Artificially Added Noise?
| | Thomas Winkler
|
| |
Tue-Ses2-S1-P: Spoken Language Processing of Human-Human Conversations II
Time: Tuesday 14:30 Place: Caravaggio (Adua 1) - Pala Affari - 1st Floor Type: Poster Chair: Dilek Hakkani-Tur
| #1 | Learning Influences from Word Use in Polylogue
| | Tomoharu Iwata, Shinji Watanabe
|
| |
| #2 | Identifying Agreement/Disagreement in Conversational Speech: A Cross-lingual Study
| | Wen Wang, Kristin Precoda, Colleen Richey, Geoffrey Raymond
|
| |
| #3 | A Dual Channel Coupled Decoder for Fillers and Feedback
| | Daniel Neiberg, Joakim Gustafson
|
| |
| #4 | An Analysis of PCA-based Vocal Entrainment Measures in Married Couples\' Affective Spoken Interactions
| | Chi-Chun Lee, Athanasios Katsamanis, Matthew P. Black, Brian R. Baucom, Panayiotis G. Georgiou, Shrikanth S. Narayanan
|
| |
Tue-Ses3-O1: Language Identification
Time: Tuesday 16:00 Place: Auditorium - Pala Congressi Type: Oral Chair: Philippe Boula de Mareüil
| 16:00 | Data-driven UBM Generation via Tied Gaussians for GMM-Supervector Based Accent Identification
| | Rong Zheng, Ce Zhang, Bo Xu
|
| |
| 16:20 | I3A Language Recognition System for Albayzin 2010 LRE
| | David Martínez, Jesús Villalba, Antonio Miguel, Alfonso Ortega, Eduardo Lleida
|
| |
| 16:40 | Dimensionality Reduction for Using High-Order n-grams in SVM-Based Phonotactic Language Recognition
| | Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-Fuentes, German Bordel
|
| |
| 17:00 | Language Recognition via Ivectors and Dimensionality Reduction
| | Najim Dehak, Pedro A. Torres Carrasquillo, Douglas Reynolds, Reda Dehak
|
| |
| 17:20 | Language Recognition in iVectors Space
| | David Martínez, Oldrich Plchot, Lukas Burget, Ondrej Glembek, Pavel Matejka
|
| |
Tue-Ses3-O3: ASR - Search, Keyword Spotting and Confidence Measures II
Time: Tuesday 16:00 Place: Brunelleschi (Green Room) - Pala Congressi - 2nd Floor Type: Oral Chair: Geoffrey Zweig
| 16:00 | A Template Based Voice Trigger System Using Bhattacharyya Edit Distance
| | Evelyn Kurniawati, Samsudin Ng, Karthik Muralidhar, Sapna George
|
| |
| 16:20 | Acoustic Look-Ahead for More Efficient Decoding in LVCSR
| | David Nolden, Ralf Schlüter, Hermann Ney
|
| |
| 16:40 | A new Epsilon Filter for Efficient Composition of Weighted Finite-State Transducers
| | Frank Duckhorn, Matthias Wolff, Rüdiger Hoffmann
|
| |
| 17:00 | A Bottom-Up Stepwise Knowledge-Integration Approach to Large Vocabulary Continuous Speech Recognition Using Weighted Finite State Machines
| | Sabato Marco Siniscalchi, Torbjorn Svendsen, Chin-Hui Lee
|
| |
| 17:20 | Combining Information Sources for Confidence Estimation with CRF Models
| | Matthew Stephen Seigel, Philip Woodland
|
| |
| 17:40 | Evaluation of Fast Spoken Term Detection Using a Suffix Array
| | Kouichi Katsurada, Shinta Sawada, Shigeki Teshima, Yurie Iribe, Tsuneo Nitta
|
| |
Tue-Ses3-O2: Second Language Acquisition, Development and Learning II
Time: Tuesday 16:00 Place: Leonardo - Pala Affari - Ground Floor Type: Oral Chair: Robert Fox
| 16:00 | On Mispronunciation Lexicon Generation using Joint-sequence Multigrams in Computer-Aided Pronunciation Training
| | Xiaojun Qian, Helen Meng, Frank Soong
|
| |
| 16:20 | Validating a second language perception model for classroom context. A longitudinal study within the Perceptual Assimilation Model
| | Bianca Sisinni, Mirko Grimaldi
|
| |
| 16:40 | The role of variability in non-native perceptual learning of a Japanese geminate-singleton fricative contrast
| | Makiko Sadakata, James M. McQueen
|
| |
| 17:00 | Fluency Changes with General Progress in L2 Proficiency
| | Jared Bernstein, Jian Cheng, Masanori Suzuki
|
| |
| 17:20 | Tongue Gestures Awareness and Pronunciation Training
| | Slim Ouni
|
| |
| 17:40 | Impact of speaker variability on speech perception in non-native listeners
| | Wim A. van Dommelen, Valerie Hazan
|
| |
Tue-Ses3-O4: SLP for Information Extraction and Retrieval I
Time: Tuesday 16:00 Place: Michelangelo - Pala Affari - 2nd Floor Type: Oral Chair: Haizhou Li
| 16:00 | LATENT TOPIC MODELING FOR AUDIO CORPUS SUMMARIZATION
| | Timothy J. Hazen
|
| |
| 16:20 | Investigation of Spontaneous Speech Characterization Applied to Speaker Role Recognition
| | Richard Dufour, Yannick Estève, Paul Deléglise
|
| |
| 16:40 | Zero-resource audio-only spoken term detection based on a combination of template matching techniques
| | Armando Muscariello, Guillaume Gravier, Frédéric Bimbot
|
| |
| 17:00 | Automatic Learning in Content Indexing Service using Phonetic Alignment
| | Yeon-Jun Kim, Dave C. Gibbon
|
| |
| 17:20 | Leveraging Relevance Cues for Improved Spoken Document Retrieval
| | Pei-Ning Chen, Kuan-Yu Chen, Berlin Chen
|
| |
| 17:40 | Spoken Lecture Summarization by Random Walk over a Graph Constructed with Automatically Extracted Key Terms
| | Yun-Nung Chen, Yu Huang, Ching-Feng Yeh, Lin-Shan Lee
|
| |
Tue-Ses3-S1-O: Speech and Audio Processing for Human-Robot Interaction I
Time: Tuesday 16:00 Place: Caravaggio (Adua 1) - Pala Affari - 1st Floor Type: Oral Chair: Laurence Devillers
| 16:00 | Using Prominence Detection to Generate Acoustic Feedback in Tutoring Situations
| | Lars Schillingmann, Petra Wagner, Christian Munier, Britta Wrede, Katharina Rohlfing
|
| |
| 16:20 | Bayesian Extension of MUSIC for Sound Source Localization and Tracking
| | Takuma Otsuka, Kazuhiro Nakadai, Tetsuya Ogata, Hiroshi G. Okuno
|
| |
| 16:40 | Speech-based Non-prototypical Affect Recognition for Child-Robot Interaction in Reverberated Environments
| | Martin Woellmer, Felix Weninger, Bjoern Schuller
|
| |
Tue-Ses3-P1: Voice Activity Detection
Time: Tuesday 16:00 Place: Valfonda 1 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Abeer Alwan
| #1 | Voice activity detection in MTF-based power envelope restoration
| | Masashi Unoki, Xugang Lu, Rico Petrick, Shota Morita, Masato Akagi, Ruediger Hoffmann
|
| |
| #2 | Using Spectral Fluctuation of Speech in multi-feature HMM-based voice activity detection
| | Miquel Espi, Shigeki Miyabe, Takuya Nishimoto, Nobutaka Ono, Shigeki Sagayama
|
| |
| #3 | Linear Dynamic Models for Voice Activity Detection
| | Kannu Mehta, Chau Khoa Pham, Eng Siong Chng
|
| |
| #4 | Detection of Shouted Speech in the Presence of Ambient Noise
| | Jouni Pohjalainen, Tuomo Raitio, Paavo Alku
|
| |
| #5 | Breath-detection-based Telephony Speech Phrasing
| | Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura
|
| |
| #6 | Multi-channel voice activity detection based on conic constraints
| | Gibak Kim
|
| |
| #7 | Multi-Sensor Voice Activity Detection based on Multiple Observation Hypothesis Testing
| | Theodoros Petsatodis, Fotios Talantzis, Christos Boukis, Zheng-Hua Tan, Ramjee Prasad
|
| |
| #8 | Online Speech Activity Detection in Broadcast News
| | Chao Gao, Guruprasad Saikumar, Saurabh Khanwalkar, Avi Herscovici, Anoop Kumar, Amit Srivastava, Premkumar Natarajan
|
| |
| #9 | A Real-Time Speech Command Detector for a Smart Control Room
| | Daniel Reich, Daniel Reich, Felix Putze, Dominic Heger, Joris Ijsselmuiden, Rainer Stiefelhagen, Tanja Schultz
|
| |
| #10 | Robust Voice Activity Detector for Real World Applications Using Harmonicity and Modulation frequency
| | Ekapol Chuangsuwanich, James Glass
|
| |
| #11 | On Noise Robust Voice Activity detection
| | Tomas Dekens, Werner Verhelst
|
| |
| #12 | Adaptive regularization framework for robust voice activity detection
| | Xugang Lu, Masashi Unoki, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura
|
| |
Tue-Ses3-P2: Human Speech Production I
Time: Tuesday 16:00 Place: Valfonda 2 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Francesco Cutugno
| #1 | On the use of extended context for HMM-based spontaneous conversational speech synthesis
| | Tomoki Koriyama, Takashi Nose, Takao Kobayashi
|
| |
| #2 | Predicting Tongue Positions from Acoustics and Facial Features
| | Asterios Toutios, Slim Ouni
|
| |
| #3 | Assessing acoustic reduction: Exploiting local structure in speech
| | Louis ten Bosch, Annika Hämäläinen, Mirjam Ernestus
|
| |
| #4 | THE “FORTIS-LENIS” DISTINCTION IN BULGARIAN AND GERMAN
| | Bistra Andreeva, Magdalena Wolska
|
| |
| #5 | Acoustic Correlates of Glottal Gaps
| | Gang Chen, Jody Kreiman, Yen-Liang Shue, Abeer Alwan
|
| |
| #6 | Using a Genetic Algorithm to Estimate Parameters of a Coarticulation Model
| | Brian Bush, John-Paul Hosom, Alexander Kain, Akiko Amano-Kusumoto
|
| |
| #7 | Synthesis of breathy, normal, and pressed phonation using a two-mass model with a triangular glottis
| | Peter Birkholz, Bernd J. Kröger, Christiane Neuschaefer-Rube
|
| |
| #8 | Analysis of inter-articulator correlation in acoustic-to-articulatory inversion using generalized smoothness criterion
| | Prasanta Ghosh, Shrikanth Narayanan
|
| |
| #9 | Frequency-domain representation of source-filter coupling and its effect in the production of voice
| | Tokihiko Kaburagi
|
| |
| #10 | Method for speech inversion with large scale statistical evaluation
| | Heikki Rasilo, Unto K. Laine, Okko Räsänen, Toomas Altosaar
|
| |
| #11 | Italian in the no-man\'s land between stress-timing and syllable-timing? Speakers are more stress-timed than listeners
| | Bettina Braun, Sabine Geiselmann
|
| |
| #12 | The Lombard Effect in Spontaneous Dialog Speech
| | Laura Folk, Florian Schiel
|
| |
Tue-Ses3-P3: Speaker Recognition - Analysis and Statistics III
Time: Tuesday 16:00 Place: Faenza 1 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Pierre-Michel Bousquet
| #1 | Variational Bayesian Model Selection for GMM-Speaker Verification using Universal Background Model
| | Timur Pekhovsky, Alexandra Lokhanova
|
| |
| #2 | To Weight or not to Weight: Source-Normalised LDA for Speaker Recognition using i-vectors
| | Mitchell McLaren, David van Leeuwen
|
| |
| #3 | Maximum Entropy based Data Selection for Speaker Recognition
| | Chien-Lin Huang, Bin Ma
|
| |
| #4 | Addressing the Data-Imbalance Problem in Kernel-based Speaker Verification via Utterance Partitioning and Speaker Comparison
| | Wei Rao, Man-Wai Mak
|
| |
| #5 | Single-channel Head Orientation Estimation Based on Discrimination of Acoustic Transfer Function
| | Ryoichi Takashima, Tetsuya Takiguchi, Yasuo Ariki
|
| |
| #6 | Maximum Likelihood i-vector Space Using PCA for Speaker Verification
| | Zhenchun Lei, Yingchun Yang
|
| |
| #7 | Speaker Verification using Sparse Representations on Total Variability I-Vectors
| | Ming Li, Xiang Zhang, Yonghong Yan, Shrikanth Narayanan
|
| |
| #8 | Robust Speaker Recognition in Non-Stationary Room Environments Based on Empirical Mode Decomposition
| | Taufiq Hasan, John Hansen
|
| |
| #9 | Range based multi microphone array fusion for speaker activity detection in small meetings
| | Jani Even, Panikos Heracleous, Carlos Ishi, Norihiro Hagita
|
| |
| #10 | Speaker verification robust to talking style variation using multiple kernel learning based on conditional entropy minimization
| | Tetsuji Ogawa, Hideitsu Hino, Noboru Murata, Tetsunori Kobayashi
|
| |
| #11 | Regularized Logistic Regression Fusion for Speaker Verification
| | Ville Marko Hautamaki, Kong Aik Lee, Tomi Kinnunen, Bin Ma, Haizou Li
|
| |
| #12 | A Longest Matching Segment Approach with Baysian Adaptation - Application to Noise-Robust Speaker Recognition
| | Ayeh Jafari, Ramji Srinivasan, Danny Crookes, Ming Ji
|
| |
| #13 | Data Selection with Kurtosis and Nasality features for Speaker Recognition
| | Howard Lei, Nikki Mirghafori
|
| |
| #14 | Use of The Harmonic Phase in Speaker Recognition
| | Inma Hernaez, Ibon Saratxaga, Jon Sanchez, Eva Navas, Iker Luengo
|
| |
Tue-Ses3-P4: Voice Conversion and Speech Synthesis
Time: Tuesday 16:00 Place: Faenza 2 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Alan Black
| #1 | Gaussian Process Experts for Voice Conversion
| | Nicholas Pilkington, Heiga Zen, Mark Gales
|
| |
| #2 | Intonation Conversion From Neutral to Expressive Speech
| | Christophe Veaux, Xavier Rodet
|
| |
| #3 | Speaker-Adaptive Speech Synthesis Based on Eigenvoice Conversion and Language-Dependent Prosodic Conversion in Speech-to-Speech Translation
| | Nobuhiko Hattori, Hisashi Kawai, Hiroshi Saruwatari, Kiyohiro Shikano
|
| |
| #4 | Adding Glottal Source Information to Intra-lingual Voice Conversion
| | Javier Pérez, Antonio Bonafonte
|
| |
| #6 | Formant-controlled HMM-based Speech Synthesis
| | Ming Lei, Junichi Yamagishi, Korin Richmond, Zhen-Hua Ling, Simon King, Li-Rong Dai
|
| |
| #7 | Analysis of HMM-Based Lombard Speech Synthesis
| | Tuomo Raitio, Antti Suni, Martti Vainio, Paavo Alku
|
| |
| #8 | Discrete/Continuous Modelling of Speaking Style in HMM-based Speech Synthesis: Design and Evaluation
| | Nicolas Obin, Pierre Lanchantin, Anne Lacheret, Xavier Rodet
|
| |
| #9 | Factored MLLR Adaptation For Singing Voice Generation
| | June Sig Sung, Doo Hwa Hong, Shin Jae Kang, Nam Soo Kim
|
| |
| #11 | Adaptation of Prosody in Speech Synthesis by Changing Command Values of the Generation Process Model of Fundamental Frequency
| | Keikichi Hirose, Keiko Ochi, Ryusuke Mihara, Hiroya Hashimoto, Daisuke Saito, Nobuaki Minematsu
|
| |
| #12 | Prosody Conversion for Emotional Mandarin Speech Synthesis Using the Tone Nucleus Model
| | Miaomiao Wen, Miaomiao Wang, Keikichi Hirose, Nobuaki Minematsu
|
| |
| #13 | Rapid Adaptation of Foreign-accented HMM-based Speech Synthesis
| | Reima Karhila, Mirjam Wester
|
| |
| #14 | The Effects of Phoneme Errors in Speaker Adaptation for HMM Speech Synthesis
| | Bálint Tóth, Tibor Fegyó, Géza Németh
|
| |
Tue-Ses3-S1-P: Speech and Audio Processing for Human-Robot Interaction II
Time: Tuesday 17:00 Place: Caravaggio (Adua 1) - Pala Affari - 1st Floor Type: Poster Chair: Alex Rudnicky
| #1 | Blind Source Separation for Robot Audition using Fixed Beamforming with HRTFs
| | Mounira Maazaoui, Yves Grenier, Karim Abed-Meraim
|
| |
| #2 | Audio-Visual Voice Activity Detection in Dynamically Changing Environments
| | Takami Yoshida, Keisuke Nakamura, Kazuhiro Nakadai
|
| |
| #3 | Emotion detection from speech in human-robot interaction
| | Marie Tahon, Agnès Delaborde, Laurence Devillers
|
| |
| #4 | WEIGHTED ORDERED CLASSES - NEAREST NEIGHBORS : A NEW FRAMEWORK FOR AUTOMATIC EMOTION RECOGNITION FROM SPEECH
| | Yazid Attabi, Pierre Dumouchel
|
| |
| #5 | Prosodic Analysis of a Corpus of Tales
| | David Doukhan, David Doukhan, Albert Rilliard, Sophie Rosset, Martine Adda-Decker, Christophe d\'Alessandro
|
| |
| #6 | Analysis of acoustic-prosodic features related to paralinguistic information carried by interjections in dialogue speech
| | Carlos T. Ishi, Hiroshi Ishiguro, Norihiro Hagita
|
| |
| #7 | Robust intonation pattern classification in human robot interaction
| | Martin Heckmann, Kazuhiro Nakadai, Hirofumi Nakajima
|
| |
| #8 | ASR for human-symbiotic robot ``EMIEW2\'\' with Mechanical Noise and Floor-Level Noise Reduction
| | Takashi Sumiyoshi, Masahito Togami, Yasunari Obuchi
|
| |
Wed-Ses1-O1: Speaker Diarization I
Time: Wednesday 10:00 Place: Auditorium - Pala Congressi Type: Oral Chair: Janez Zibert
| 10:00 | SPEAKER DIARIZATION USING A PRIORI ACOUSTIC INFORMATION
| | Hagai Aronowitz
|
| |
| 10:20 | Improved Overlapped Speech Handling for Speaker Diarization
| | Kofi Boakye, Oriol Vinyals, Gerald Friedland
|
| |
| 10:40 | Exploiting Intra-Conversation Variability for Speaker Diarization
| | Stephen Shum, Najim Dehak, Ekapol Chuangsuwanich, Douglas Reynolds, Jim Glass
|
| |
| 11:00 | Speaker Clustering Based on Non-negative Matrix Factorization
| | Masafumi Nishida, Seiichi Yamamoto
|
| |
| 11:20 | Information Bottleneck Features for HMM/GMM Speaker Diarization of Meetings Recordings
| | Sree Harsha Yella, Fabio Valente
|
| |
| 11:40 | Cross Likelihood Ratio Based Speaker Clustering Using Eigenvoice Models
| | David Wang, Robert Vogt, Sridha Sridharan, David Dean
|
| |
Wed-Ses1-O3: ASR - New Paradigms
Time: Wednesday 10:00 Place: Brunelleschi (Green Room) - Pala Congressi - 2nd Floor Type: Oral Chair: John Hansen
| 10:00 | New Methods for Template Selection and Compression in Continuous Speech Recognition
| | Xie Sun, Yunxin Zhao
|
| |
| 10:20 | Structured Support Vector Machines for Noise Robust Continuous Speech Recognition
| | Shi-Xiong Zhang, M.J.F. Gales
|
| |
| 10:40 | Continuous Digits Recognition Leveraging Invariant Structure
| | Masayuki Suzuki, Gakuto Kurata, Masafumi Nishimura, Nobuaki Minematsu
|
| |
| 11:00 | Convergence of Line Search A-Function methods
| | Dimitri Kanevsky, David Nahamoo, Tara Sainath, Bhuvana Ramabhadran
|
| |
| 11:20 | Hidden Boosted MMI and Hierarchical State Posterior Feature for Automatic Speech Recognition based on Hidden Conditional Neural Fields
| | Yasuhisa Fujii, Kazumasa Yamamoto, Seiichi Nakagawa
|
| |
| 11:40 | Recognition and Real Time Performances of a Lightweight Ultrasound Based Silent Speech Interface Employing a Language Model
| | Jun Cai, Bruce Denby, Pierre Roussel, Gerard Dreyfus, Lise Crevier-Buchman
|
| |
Wed-Ses1-S3: Speech Processing Tools
Time: Wednesday 10:00 Place: Donatello (Room Onice) - Pala Congressi - Ground Floor Type: Poster Chair: Christoph Draxler
| #1 | Speech Processing Tools - An Introduction to Interoperability
| | Christoph Draxler, Toomas Altosaar, Sadaoki Furui, Mark Liberman, Peter Wittenburg
|
| |
| #2 | EasyAlign: an automatic phonetic alignment tool under Praat
| | Jean-Philippe Goldman
|
| |
| #3 | MTRANS: A multi-channel, multi-tier speech annotation tool
| | Julián Villegas, Martin Cooke, Vincent Aubanel, Marco A. Piccolino-Boniforti
|
| |
| #4 | The JSafran platform for semi-automatic speech processing
| | Christophe Cerisara, Claire Gardent
|
| |
| #5 | The Social Signal Interpretation Framework (SSI) for Real Time Signal Processing and Recognition
| | Johannes Wagner, Florian Lingenfelser, Elisabeth Andre
|
| |
| #6 | ELAN – aspects of interoperability and functionality
| | Han Sloetjes, Peter Wittenburg, Aarthy Somasundaram
|
| |
| #7 | Open source voice creation toolkit for the MARY TTS Platform
| | Marc Schröder, Marcela Charfuelan, Sathish Pammi, Ingmar Steiner
|
| |
| #8 | Java Visual Speech Components for Rapid Application Development of GUI based Speech Processing Applications
| | Stefan Steidl, Korbinian Riedhammer, Tobias Bocklet, Florian Hönig, Elmar Nöth
|
| |
| #9 | mTalk - A Multimodal Browser for Mobile Services
| | Michael Johnston, Giuseppe Di Fabbrizio, Simon Urbanek
|
| |
| #10 | Web-based automatic speech recognition service - webASR
| | Stuart Nicholas Wrigley, Thomas Hain
|
| |
| #11 | A Web based Speech Transcription Workplace
| | Markus Klehr, Andreas Ratzka, Thomas Ross
|
| |
| #12 | WinPitch, a multimodal tool for speech analysis of endangered languages
| | Philippe Martin
|
| |
| #13 | Recording caregiver interactions for machine acquisition of spoken language using the KLAIR virtual infant
| | Mark Huckvale
|
| |
Wed-Ses1-O2: Prosody I
Time: Wednesday 10:00 Place: Leonardo - Pala Affari - Ground Floor Type: Oral Chair: Gérard Bailly
| 10:00 | A quantitative investigation of the prosody of Verum Focus in Italian
| | Giuseppina Turco, Michele Gubian, Jessamyn Schertz
|
| |
| 10:20 | Effects of focus on f0 and duration in Irish (Gaelic) declaratives
| | Amelie Dorn, Ailbhe Ní Chasaide
|
| |
| 10:40 | The phonology and phonetics of perceived prosody: What do listeners imitate?
| | Jennifer Cole, Stefanie Shattuck-Hufnagel
|
| |
| 11:00 | Uncovering the effect of imitation on tonal patterns of French Accentual Phrases
| | Amandine Michelas, Noël Nguyen
|
| |
| 11:20 | Crossmodal prosodic and gestural contribution to the perception of contrastive focus to the perception of contrastive focus
| | Pilar Prieto, Cecilia Pugliesi, Joan Borràs-Comes, Ernesto Arroyo, Josep Blat
|
| |
| 11:40 | Temporal relationship between auditory and visual prosodic cues
| | Erin Cvejic, Jeesun Kim, Chris Davis
|
| |
Wed-Ses1-O4: Spoken Dialogue Systems II
Time: Wednesday 10:00 Place: Michelangelo - Pala Affari - 2nd Floor Type: Oral Chair: Steve Young
| 10:00 | Optimizing Situated Dialogue Management in Unknown Environments
| | Heriberto Cuayahuitl, Nina Dethlefs
|
| |
| 10:20 | Acoustic-similarity based technique to improve concept recognition
| | Om D Deshmukh, Shajith Ikbal, Ashish Verma, Etienne Marcheret
|
| |
| 10:40 | Dialog Methods for Improved Alphanumeric String Capture
| | Doug Peters, Peter Stubley
|
| |
| 11:00 | Detecting the Status of a Predictive Incremental Speech Understanding Model for Real-Time Decision-Making in a Spoken Dialogue System
| | David DeVault, Kenji Sagae, David Traum
|
| |
| 11:20 | User Simulation in Dialogue Systems using Inverse Reinforcement Learning
| | Senthilkumar Chandramohan, Matthieu Geist, Fabrice Lefevre, Olivier Pietquin
|
| |
| 11:40 | Lossless Value Directed Compression of Complex User Goal States for Statistical Spoken Dialogue Systems
| | Paul A. Crook, Oliver Lemon
|
| |
Wed-Ses1-S1: Speaker State Challenge - Intoxication and Sleepiness I
Time: Wednesday 10:00 Place: Raffaello - Pala Affari - 3rd Floor Type: Oral Chair: Bjoern Schuller
| 10:00 | The INTERSPEECH 2011 Speaker State Challenge
| | Björn Schuller, Stefan Steidl, Anton Batliner, Florian Schiel, Jarek Krajewski
|
| |
| 10:20 | Combining Multiple Phoneme-based Classifiers with Audio Feature-based Classifier for the Detection of Alcohol Intoxication
| | Claude Montacié, Marie-José Caraty
|
| |
| 10:40 | Intoxication Detection using Phonetic, Phonotactic and Prosodic Cues
| | Fadi Biadsy, William Yang Wang, Andrew Rosenberg, Julia Hirschberg
|
| |
| 11:00 | Drink and Speak: On the automatic classification of alcohol intoxination by acoustic, prosodic and text-based features
| | Tobias Bocklet, Korbinian Riedhammer, Elmar Nöth
|
| |
| 11:20 | Intoxicated Speech Detection Using Hierarchical Features and Iterative Speaker Normalization
| | Daniel Bone, Matthew P. Black, Ming Li, Angeliki Metallinou, Sungbok Lee, Shrikanth S. Narayanan
|
| |
| 11:40 | Attention, Sobriety Checkpoint! Can Humans Determine by Means of Voice, if Someone is Drunk... and can Automatic Classifiers Compete?
| | Stefan Ultes, Alexander Schmitt, Wolfgang Minker
|
| |
| 12:00 | Does it Groove or Does it Stumble - Automatic Classification of Alcoholic Intoxiation Using Prosodic Features
| | Florian Hönig, Anton Batliner
|
| |
Wed-Ses1-S2-O: Speech Technology for Under-Resourced Languages I
Time: Wednesday 10:00 Place: Caravaggio (Adua 1) - Pala Affari - 1st Floor Type: Oral Chair: Alexey Karpov
Chairs: Alexey Karpov, Laurent Besacier
| 10:00 | Rapid building of an ASR system for Under-Resourced Languages based on Multilingual Unsupervised Training
| | Ngoc Thang Vu, Franziska Kraus Kraus, Tanja Schultz
|
| |
| 10:20 | Places and Manner of Articulation of Bangla Consonants: A EPG based study
| | Shyamal Kr Das Mandal, Somnath Chandra, Swaran Lata, Ashoke Kumar Datta
|
| |
| 10:40 | Efficient harvesting of Internet audio for resource-scarce ASR
| | Marelie Hattingh Davel, Charl van Heerden, Neil Kleynhans
|
| |
Wed-Ses1-P1: Human Speech Production II
Time: Wednesday 10:00 Place: Valfonda 1 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Francis Grenez
| #2 | Articulatory Reduction in Mandarin Chinese Words
| | Jeffrey Berry, Sunjing Ji, Ian Fasel, Diana Archangeli
|
| |
| #3 | Morphological Variation in the Adult Vocal Tract: A Modeling Study of its Potential Acoustic Impact
| | Adam Lammert, Michael Proctor, Athanasios Katsamanis, Shrikanth Narayanan
|
| |
| #4 | Analysis and automatic estimation of children\'s subglottal resonances
| | Steven M. Lulich, Harish Arsikere, John R. Morton, Gary K. F. Leung, Abeer Alwan, Mitchell S. Sommers
|
| |
| #5 | Acceleration Sensor Based Estimates of Subglottal Resonances: Short vs. Long Vowels
| | Wolfgang Wokurek, Andreas Madsack
|
| |
| #6 | Comparison of nasalance measurements from accelerometers and microphones and preliminary development of novel features
| | Nicolas Audibert, Angélique Amelot
|
| |
| #7 | The effect of seeing the interlocutor on speech production in different noise types
| | Michael Fitzpatrick, Jeesun Kim, Davis Chris
|
| |
| #8 | Conversing in the presence of a competing conversation: effects on speech production
| | Vincent Aubanel, Martin Cooke, Julian Villegas, Maria Luisa Garcia Lecumberri
|
| |
| #9 | Very short utterances and timing in turn-taking
| | Mattias Heldner, Jens Edlund, Anna Hjalmarsson, Kornel Laskowski, Kornel Laskowski
|
| |
| #10 | Validating rt-MRI based articulatory representations via articulatory recognition
| | Athanasios Katsamanis, Erik Bresch, Vikram Ramanarayanan, Shrikanth Narayanan
|
| |
| #11 | An Electropalatographic and Acoustic Study on Anticipatory Coarticulation in V1#C2V2 Sequences in Standard Chinese
| | Yinghao Li, Jiangping Kong
|
| |
| #12 | Final /t/ reduction in Dutch past-participles: the role of word predictability and morphological decomposability
| | Iris Hanique, Mirjam Ernestus
|
| |
| #13 | Parametrising Degree of Articulator Movement from Dynamic MRI Data
| | Raeesy Zeynab, Baghai-Ravary Ladan, Coleman John
|
| |
Wed-Ses1-P2: Systems for LVCSR and rich transcription
Time: Wednesday 10:00 Place: Valfonda 2 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Diego Giuliani
| #1 | Improving LVCSR System Combination Using Neural Network Language Model Cross Adaptation
| | Xunying Liu, Mark Gales, Phil Woodland
|
| |
| #2 | TOWARDS HIGH PERFORMANCE LVCSR IN SPEECH-TO-SPEECH TRANSLATION SYSTEM ON SMART PHONES
| | Jian Xue, Xiaodong Cui, Gregg Daggett, Etienne Marcheret, Bowen Zhou
|
| |
| #3 | Deploying Google Search by Voice in Cantonese
| | Yun-Hsuan Sung, Martin Jansche, Pedro Moreno
|
| |
| #4 | An Investigation on Speech Recognition for Colloquial Arabic
| | Sarah Al-Shareef, Thomas Hain
|
| |
| #5 | A multithreaded implementation of Viterbi decoding on Recursive Transition Networks
| | Fabio Brugnara
|
| |
| #6 | Recurrent Neural Network based Language Modeling in Meeting Recognition
| | Stefan Kombrink, Tomas Mikolov, Karafiat Martin, Burget Lukas
|
| |
| #7 | Ad-Hoc Meeting Transcription on Clusters of Mobile Devices
| | Michele Cossalter, Priya Sundararajan, Ian Lane
|
| |
| #8 | ROVER Enhancement with Automatic Error Detection
| | Kacem Abida, Fakhri Karray
|
| |
| #9 | Automatic Comma Insertion of Lecture Transcripts Based on Multiple Annotations
| | Yuya Akita, Tatsuya Kawahara
|
| |
Wed-Ses1-P3: Language, Dialect Identification and Speaker Diarization
Time: Wednesday 10:00 Place: Faenza 1 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: Nancy Chen
| #1 | Study on the Relevance Factor of Maximum a Posteriori with GMM for Language Recognition
| | Chang Huai You, Haizhou Li, Kong Aik Lee
|
| |
| #2 | Improving Multiband Position Pitch Algorithm for Localization and Tracking of Multiple Concurrent Speakers by using a Frequency Selective Criterion
| | Tania Habib, Harald Romsdorfer
|
| |
| #3 | On the Use of Lattices of Time-Synchronous Cross-Decoder Phone Co-occurrences in a SVM-Phonotactic Language Recognition System
| | Amparo Varona, Mikel Penagarikano, Luis Javier Rodriguez-Fuentes, German Bordel
|
| |
| #4 | Speaker Clustering Based on Utterance-oriented Dirichlet Process Mixture Model
| | Naohiro Tawara, Shinji Watanabe, Tetsuji Ogawa, Tetsunori Kobayashi
|
| |
| #5 | PLDA-based Clustering for Speaker Diarization of Broadcast Streams
| | Jan Silovsky, Jan Prazak, Petr Cerva, Jindrich Zdansky, Jan Nouza
|
| |
| #6 | iVector Approach to Phonotactic Language Recognition
| | Mehdi Soufifar, Marcel Kockmann, Lukas Burget, Olda Plchot, Ondrej Glembek, Torbjørn Svendsen
|
| |
| #7 | Discriminative Features For Language Identification
| | Christopher Alberti, Michiel Bacchiani
|
| |
| #8 | Perceptual sensitivity to dialectal and generational variations in vowels
| | Robert Allen Fox, Ewa Jacewicz
|
| |
| #9 | Investigation of Cross-show Speaker Diarization
| | Qian Yang, Tanja Schultz, Qin Jin
|
| |
| #10 | Language Identification for Text Chats
| | Vesa Siivola, Bryan Pellom, Meagan Sills
|
| |
| #11 | Spoken Language Recognition in the Latent Topic Simplex
| | Kong Aik Lee, Chang Huai You, Ville Hautamäki, Anthony Larcher, Haizhou Li
|
| |
Wed-Ses1-P4: Paralinguistic Information - Analysis and Tools
Time: Wednesday 10:00 Place: Faenza 2 - Pala Congressi (Passi Perduti-Gallery) Type: Poster Chair: shri narayanan
| #1 | Investigating Robustness of Spectral Moments on Normal- and High-Effort Speech
| | Frederike Gottsmann, Corinna Harwardt
|
| |
| #2 | Comparing the Impact of Raised Vocal Effort on Various Spectral Parameters
| | Corinna Harwardt
|
| |
| #4 | Vowel Context and Speaker Interactions Influencing Glottal Open Quotient and Formant Frequency Shifts in Physical Task Stress
| | Keith W. Godin, John H. L. Hansen
|
| |
| #5 | Prosodic Correlates of Individual Physiological Response to Stress
| | Serguei Pakhomov, Michael Kotlyar
|
| |
| #6 | The vocal effort of dominance in scenario meetings
| | Marcela Charfuelan, Marc Schröder
|
| |
| #7 | A Preliminary Model of Emotional Prosody using Multidimensional Scaling
| | Sona Patel, Rahul Shrivastav
|
| |
| #8 | An Exploratory Study of the Relations between Perceived Emotion Strength and Articulatory Kinematics
| | Jangwon Kim, Sungbok Lee, Shrikanth Narayanan
|
| |
| #9 | Improved Acoustic Characterization of Breathy and Whispery Voices
| | Carlos T. Ishi, Hiroshi Ishiguro, Norihiro Hagita
|
| |
| #10 | Neutral to Target Emotion Conversion Using Source and Suprasegmental Information
| | Govind D, Prasanna S R Mahadeva, Yegnanarayana B
|
| |
| #11 | A multimodal analysis of vocal and visual backchannels in spontaneous dialogs
| | Khiet P. Truong, Ronald Poppe, Iwan de Kok, Dirk Heylen
|
| |
| #12 | Kernel models for affective lexicon creation
| | Nikos Malandrakis, Alexandros Potamianos, Elias Iosif, Shrikanth Narayanan
|
| |
Wed-Ses1-S2-P: Speech Technology for Under-Resourced Languages II
Time: Wednesday 11:00 Place: Caravaggio (Adua 1) - Pala Affari - 1st Floor Type: Poster Chair: Laurent Besacier
Chairs: Laurent Besacier, Alexey Karpov
| #1 | Automatic Prosody Generation for Serbo-Croatian Speech Synthesis Based on Regression Trees
| | Milan Sečujski, Darko Pekar, Nikša Jakovljević
|
| |
| #2 | Very Large Vocabulary ASR for Spoken Russian with Syntactic and Morphemic Analysis
| | Alexey Karpov, Irina Kipyatkova, Andrey Ronzhin
|
| |
| #3 | Cross-language phone recognition when the target language phoneme inventory is not known
| | Timothy Kempton, Roger Moore, Thomas Hain
|
| |
| #4 | A Paradigm for Small Vocabulary Speech Recognition Based on Redundant Spectro-Temporal Feature Sets
| | Sourish Chaudhuri, Bhiksha Raj
|
| |
| #5 | GorUp: an ontology-driven Audio Information Retrieval system that suits the requirements of under-resourced languages
| | Nora Barroso, Karmele López de Ipiña, Aitzol Ezeiza, Carmen Hernández, Nerea Ezeiza, Odei Barroso, Unai Susperregi, Barroso Simeon
|
| |
| #6 | Woefzela - An open-source platform for ASR data collection in the developing world
| | Nic De Vries, Jaco Badenhorst, Marelie Davel, Etienne Barnard, Alta De Waal
|
| |
| #7 | A Study on the Perception of Tone and Intonation in Sesotho
| | Hansjörg Mixdorff, Lehlohonolo Mohasi, \'Malillo Machobane, Thomas Niesler
|
| |
| #8 | Developing a broadband automatic speech recognition system for Afrikaans
| | Febe de Wet, Alta de Waal, Gerhard van Huyssteen
|
| |
| #9 | Multi-accent speech recognition of Afrikaans, Black and White varieties of South African English
| | Herman Kamper, Thomas Niesler
|
| |
| #10 | Perceptual Representation of Consonant Sounds in Thai
| | Charturong Tantibundhit, Chutamanee Onsuwan, Tanawan Saimai, Nantaporn Saimai, sumonmas Thatphithakkul, P. Chootrakool, Krit Kosawat, Nattanun Thatphithakkul
|
| |
| #11 | A cross-lingual approach to the development of an HMM-based speech synthesis system for Malay
| | Mumtaz Begum Mustafa, Ainon Raja Noor, Roziati Zainuddin, Zuraidah M. Don, Gerry Knowles
|
| |
Wed-Ses2-O1: Speaker Diarization II
Time: Wednesday 13:30 Place: Auditorium - Pala Congressi Type: Oral Chair: Hagai Aronowitz
| 13:30 | Prosodic and Phonetic Features for Speaker Clustering in Speaker Diarization Systems
| | Janez Zibert, France Mihelic
|
| |
| 13:50 | Diarization-based Speaker Retrieval for Broadcast Television Archives
| | Marijn Huijbregts, David Leeuwen van
|
| |
| 14:10 | The detection of overlapping speech with prosodic features for speaker diarization
| | Martin Zelenák, Javier Hernando
|
| |
| 14:30 | LP Residual Features for Robust, Privacy-Sensitive Speaker Diarization
| | Sree Hari Krishnan Parthasarathi, Herve Bourlard, Daniel Gatica-Perez
|
| |
| 14:50 | Extending the Task of Diarization to Speaker Attribution
| | Houman Ghaemmaghami, David Dean, Robbie Vogt, Sridha Sridharan
|
| |
| 15:10 | Comparing Multi-Stage Approaches for Cross-Show Speaker Diarization
| | Viet-Anh Tran, Viet Bac Le, Claude Barras, Lori Lamel
|
| |
Wed-Ses2-O3: Adaptation for ASR
Time: Wednesday 13:30 Place: Brunelleschi (Green Room) - Pala Congressi - 2nd Floor Type: Oral Chair: Phil Woodland
| 13:30 | Model Adaptation for Automatic Speech Recognition Based on Multiple Time Scale Evolution
| | Shinji Watanabe, Atsushi Nakamura, Biing-Hwang Juang
|
| |
| 13:50 | Integrated Online Speaker Clustering and Adaptation
| | Catherine Breslin, KK Chin, Mark Gales, Kate Knill
|
| |
| 14:10 | A study on speaker normalized MLP features in LVCSR
| | Zoltán Tüske, Christian Plahl, Ralf Schlüter
|
| |
| 14:30 | Matrix-Variate Distribution of Training Models for Robust Speaker Adaptation
| | Yongwon Jeong, Young Kuk Kim
|
| |
| 14:50 | Separating Speaker and Environmental Variability Using Factored Transforms
| | Michael Seltzer, Alex Acero
|
| |
| 15:10 | Your Mobile Virtual Assistant Just Got Smarter!
| | Mazin Gilbert, Iker Arizmendi, Enrico Bocchieri, Diamantino Caseiro, Vincent Goffin, Andrej Ljolje, Mike Philips, Chao Wang, Jay Wilpon
|
| |
Wed-Ses2-O2: Prosody II
Time: Wednesday 13:30 Place: Leonardo - Pala Affari - Ground Floor Type: Oral Chair: Pilar Prieto
| 13:30 | Analysing the correspondence between automatic prosodic segmentation and syntactic structure
| | Gyorgy Szaszak, Katalin Nagy, Andras Beke
|
| |
| 13:50 | Long-distance rhythmic dependencies and their application to automatic language identification
| | Joseph Tepperman, Emily Nava
|
| |
| 14:10 | Symbolic and Direct Sequential Modeling of Prosody for Classification of Speaking-Style and Nativeness
| | Andrew Rosenberg
|
| |
| 14:30 | Prosodic Analysis and Perception of Mandarin Utterances Conveying Attitudes
| | Wentao Gu, Ting Zhang, Hiroya Fujisaki
|
| |
| 14:50 | Predicting Taiwan Mandarin tone shapes from their duration
| | Chierh Cheng, Michele Gubian
|
| |
| 15:10 | Variation of Accent Type and of Context – Influences on Pragmatic Focus Interpretation
| | Charlotte Wollermann, Ulrich Schade, Bernhard Schröder
|
| |
Wed-Ses2-O4: SLP for Information Extraction and Retrieval II
Time: Wednesday 13:30 Place: Michelangelo - Pala Affari - 2nd Floor Type: Oral Chair: Pascale Fung
| 13:30 | Topic Segmentation of TV-streams by mathematical morphology and vectorization
| | Vincent Claveau, Sébastien Lefèvre
|
| |
| 13:50 | Probabilistic Latent Semantic Analysis for Broadcast News Story Segmentation
| | Mimi Lu, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li
|
| |
| 14:10 | Hybrid Speech Recognition for Voice Search: a Comparative Study
| | Evandro Gouvea
|
| |
| 14:30 | A New Phonetic Candidate Generator for Improving Search Query Efficiency
| | Bo Peng, Yao Qian, Frank Soong, Bo Zhang
|
| |
| 14:50 | Towards Voice-Input Symbolic Pattern Retrieval using Parameter-Based Search
| | Yukiko Suzuki, Kiyoaki Aikawa
|
| |
| 15:10 | A Language Independent Approach to Audio Search
| | Vikram Gupta, Jitendra Ajmera, Arun Kumar, Ashish Verma
|
| |
Wed-Ses2-S1: Speaker State Challenge - Intoxication and Sleepiness II
Time: Wednesday 13:30 Place: Raffaello - Pala Affari - 3rd Floor Type: Oral Chair: Anton Batliner
| 13:30 | Perception of Alcoholic Intoxication in Speech
| | Florian Schiel
|
| |
| 13:50 | Detecting sleepiness by fusing classifiers trained with novel acoustic features
| | Tauhidur Rahman, Soroosh Mariooryad, Shalini Keshavamurthy, Gang Liu, John H.L. Hansen, Carlos Busso
|
| |
| 14:10 | An HMM-Based Approach to the INTERSPEECH 2011 Speaker State Challenge
| | Albino Nogueiras
|
| |
| 14:30 | RANSAC-based Training Data Selection for Speaker State Recognition
| | Elif Bozkurt, Engin Erzin, Cigdem Eroglu Erdem, Arif Tanju Erdem
|
| |
| 14:50 | University of Ljubljana System for Interspeech 2011 Speaker State Challenge
| | Rok Gajšek, Simon Dobrišek, France Mihelič
|
| |
| 15:10 | Speaker State Classification Based on Fusion of Asymmetric SIMPLS and Support Vector Machines
| | Dong-Yan Huang, Shuzhi Sam Ge, Zhengchen Zhang
|
| |
| |