Dante - Di Michelino 150° sponsors







Corporate & Society Sponsors
Loquendo diamond package
Nuance gold package
ATT bronze package
Google silver package
Appen bronze package
Appen bronze package
Interactive Media bronze package
Microasoft bronze package
SpeechOcean bronze package
Avios logo package
NDI logo package
NDI logo package

CNR-ISTC

CNR-ISTC
Universit柤e Avignon
Speech Cycle
AT&T
Universit�i Firenze
FUB
FBK
Univ. Trento
Univ. Napoli
Univ. Tuscia
Univ. Calabria
Univ. Venezia

AISV
AISV

AISV
AISV
Comune di Firenze
Firenze Fiera
Florence Convention Bureau

ISCA

12thAnnual Conference of the
International Speech Communication Association

Sponsors
sponsors

Interspeech 2011 Florence

Technical Programme

This is the final programme for this session. For oral sessions, the timing on the left is the current presentation order, but this may still change, so please check at the conference itself. If you have signed in to My Schedule, you can add papers to your own personalised list.

Wed-Ses2-S1:
Speaker State Challenge - Intoxication and Sleepiness II

Time:Wednesday 13:30 Place:Raffaello - Pala Affari - 3rd Floor Type:Oral
Chair:Anton Batliner

13:30Perception of Alcoholic Intoxication in Speech

Florian Schiel (Bavarian Archive for Speech Signals, Ludwig-Maximilians-Universität)

The ALC sub-challenge of the Interspeech Speaker State Challenge (ISSC) aims at the automatic classification of speech signals into intoxicated and sober speech. In this context we conducted a perception experiment on data derived from the same corpus to analyse the performance of humans on the same task. The results show that humans still outperform comparable baseline results of ISSC. Female and male listeners perform on the same level, but there is strong evidence that intoxication in female voices is easier to be recognized than in male voices. Prosodic features contribute to the decision of human listeners but seem not to be dominant. In analogy to Doddington's zoo of speaker verification we find some evidence for the existence of lambs and goats but no wolves.

13:50Detecting sleepiness by fusing classifiers trained with novel acoustic features

Tauhidur Rahman (University of Texas at Dallas)
Soroosh Mariooryad (University of Texas at Dallas)
Shalini Keshavamurthy (University of Texas at Dallas)
Gang Liu (University of Texas at Dallas)
John H.L. Hansen (University of Texas at Dallas)
Carlos Busso (University of Texas at Dallas)

Automatic sleepiness detection is a challenging task that can lead to advances in various domains including traffic safety, medicine and human-machine interaction. This paper analyzes the discriminative power of different acoustic features to detect sleepiness. The study uses the sleepy language corpus (SLC). Along with standard acoustic features, novel features are pro- posed including functionals across voiced segment statistics in the F0 contour, likelihoods of reference models used to contrast non-neutral speech, and a set of robust to noise spectral features. These feature sets, which have performed well in other paralinguistic tasks such as emotion recognition, are used to train classifiers that are combined at the feature and decision levels. The best unweighted accuracy (UA) is obtained by combining the classifiers at the decision level under a maximum likelihood framework (UA = 70.97%). This performance is higher than the best results reported in the corpus.

14:10An HMM-Based Approach to the INTERSPEECH 2011 Speaker State Challenge

Albino Nogueiras (Universitat Politecnica de Catalunya. Barcelona, SPAIN.)

The current main trend in paralinguistic information recognition is the so-called static classification. In this kind of classification the low level descriptors are pooled together by means of statistical functionals and all, or almost all, information about the temporal structure and evolution of speech is lost. Although this approach represents the state-of-the-art, we believe that dynamic classification, where temporal information is kept, still deserves some attention due to its capability to handle aspects impossible to do by the static one. In this paper the INTERSPEECH 2011 Speaker State Challenged is addressed using the Automatic Speech Recognition system developed at UPC, which has already been used in a similar task: emotion recognition. Although results fall below the baseline, we believe that they are close enough to be taken into account.

14:30RANSAC-based Training Data Selection for Speaker State Recognition

Elif Bozkurt (Koc University, Istanbul, Turkey)
Engin Erzin (Koc University, Istanbul, Turkey)
Cigdem Eroglu Erdem (Bahcesehir University, Istanbul, Turkey)
Arif Tanju Erdem (Ozyegin University, Istanbul, Turkey)

We present a Random Sampling Consensus (RANSAC) based training approach for the problem of speaker state recognition from spontaneous speech. Our system is trained and tested with the INTERSPEECH 2011 Speaker State Challenge corpora that includes the Intoxication and the Sleepiness Sub-challenges, where each sub-challenge defines a two-class classification task. We aim to perform a RANSAC-based training data selection coupled with the Support Vector Machine (SVM) based classification to prune possible outliers, which exist in the training data. Our experimental evaluations indicate that utilization of RANSAC-based training data selection provides 66.32 % and 65.38 % unweighted average (UA) recall rate on the development and test sets for the Sleepiness Sub-challenge, respectively and a slight improvement on the Intoxication Sub-challenge performance.

14:50University of Ljubljana System for Interspeech 2011 Speaker State Challenge

Rok Gajšek (University of Ljubljana)
Simon Dobrišek (University of Ljubljana)
France Mihelič (University of Ljubljana)

The paper presents our efforts in the Interspeech 2011 Speaker State Challenge. Both systems, for the Intoxication and the Sleepiness Sub-Challenge, are based on a Universal Background Model (UBM) in a form of a Hidden Markov Model (HMM) and the Maximum A Posteriori (MAP) adaptation. With the combination of our HMM-UBM-MAP derived super-vectors and selected statistical functionals from the baseline feature set, we were able to surpass the baseline system in both sub-challenges. By employing majority voting fusion of best systems we were able to further improve the performance. In the Intoxication Sub-Challenge our best result on the test set is 67.46%, and in the Sleepiness Sub-Challenge 71.28%.

15:10Speaker State Classification Based on Fusion of Asymmetric SIMPLS and Support Vector Machines

Dong-Yan Huang (Institute for Infocomm Research)
Shuzhi Sam Ge (Social Robotics Lab, Interactive Digital Media Institute)
Zhengchen Zhang (Social Robotics Lab, Interactive Digital Media Institute)

This paper describes a Speaker State Classification System (SSCS) for the INTERSPEECH 2011 Speaker State Challenge. Our SSC system for the Intoxication and Sleepiness Sub-Challenges uses fusion of several individual sub-systems. We make use of three standard feature sets per corpus given by organizers and MFCCs. Modeling is based on our own developed classification method - Asymmetric simple partial least squares (ASIMPLS) and Support Vector Machines (SVMs), followed by the calibration and multiple fusion methods. The advantage of asymmetric SIMPLS is prone to protect the minority class from being misclassified and boosts the performance on the majority class. Our experimental results show that our SSC system performs better than baseline system. Our final fusion results in 1.8% absolute improvement on the unweighted accuracy value for the Alcohol Language Corpus (ALC) and about 0.7% for the Sleepy Language Corpus (SLC) on the development set over the baseline. On the test set, we obtain 1.1% and 1.4 % absolute improvement, respectively.