|
12thAnnual Conference of the
International Speech Communication Association
|
sponsors
|
Interspeech 2011 Florence |
Technical Programme
This is the final programme for this session. For oral sessions, the timing on the left is the current presentation order, but this may still change, so please check at the conference itself. If you have signed in to My Schedule, you can add papers to your own personalised list.
Wed-Ses2-S1: Speaker State Challenge - Intoxication and Sleepiness II
| Time: | Wednesday 13:30 |
Place: | Raffaello - Pala Affari - 3rd Floor |
Type: | Oral |
| Chair: | Anton Batliner |
| 13:30 | Perception of Alcoholic Intoxication in Speech
Florian Schiel (Bavarian Archive for Speech Signals, Ludwig-Maximilians-Universität)
The ALC sub-challenge of the Interspeech Speaker State Challenge (ISSC) aims
at the automatic classification of speech signals into intoxicated and sober
speech. In this context we conducted a perception experiment on data derived
from the same corpus to analyse the performance of humans on the same task. The
results show that humans still outperform comparable baseline results of ISSC.
Female and male listeners perform on the same level, but there is strong
evidence that intoxication in female voices is easier to be recognized than
in male voices. Prosodic features contribute to the decision of human listeners
but seem not to be dominant. In analogy to Doddington's zoo of speaker verification
we find some evidence for the existence of lambs and goats but no wolves.
|
| 13:50 | Detecting sleepiness by fusing classifiers trained with novel acoustic features
Tauhidur Rahman (University of Texas at Dallas) Soroosh Mariooryad (University of Texas at Dallas) Shalini Keshavamurthy (University of Texas at Dallas) Gang Liu (University of Texas at Dallas) John H.L. Hansen (University of Texas at Dallas) Carlos Busso (University of Texas at Dallas)
Automatic sleepiness detection is a challenging task that can lead to advances in various domains including traffic safety, medicine and human-machine interaction. This paper analyzes the discriminative power of different acoustic features to detect sleepiness. The study uses the sleepy language corpus (SLC). Along with standard acoustic features, novel features are pro- posed including functionals across voiced segment statistics in the F0 contour, likelihoods of reference models used to contrast non-neutral speech, and a set of robust to noise spectral features. These feature sets, which have performed well in other paralinguistic tasks such as emotion recognition, are used to train classifiers that are combined at the feature and decision levels. The best unweighted accuracy (UA) is obtained by combining the classifiers at the decision level under a maximum likelihood framework (UA = 70.97%). This performance is higher than the best results reported in the corpus.
|
| 14:10 | An HMM-Based Approach to the INTERSPEECH 2011 Speaker State Challenge
Albino Nogueiras (Universitat Politecnica de Catalunya. Barcelona, SPAIN.)
The current main trend in paralinguistic information recognition is the so-called static classification. In this kind of classification the low level descriptors are pooled together by means of statistical functionals and all, or almost all, information about the temporal structure and evolution of speech is lost. Although this approach represents the state-of-the-art, we believe that dynamic classification, where temporal information is kept, still deserves some attention due to its capability to handle aspects impossible to do by the static one. In this paper the INTERSPEECH 2011 Speaker State Challenged is addressed using the Automatic Speech Recognition system developed at UPC, which has already been used in a similar task: emotion recognition. Although results fall below the baseline, we believe that they are close enough to be taken into account.
|
| 14:30 | RANSAC-based Training Data Selection for Speaker State Recognition
Elif Bozkurt (Koc University, Istanbul, Turkey) Engin Erzin (Koc University, Istanbul, Turkey) Cigdem Eroglu Erdem (Bahcesehir University, Istanbul, Turkey) Arif Tanju Erdem (Ozyegin University, Istanbul, Turkey)
We present a Random Sampling Consensus
(RANSAC) based training approach for the problem
of speaker state recognition from spontaneous
speech. Our system is trained and tested with the
INTERSPEECH 2011 Speaker State Challenge corpora
that includes the Intoxication and the Sleepiness
Sub-challenges, where each sub-challenge defines a
two-class classification task. We aim to perform a
RANSAC-based training data selection coupled with the
Support Vector Machine (SVM) based classification to
prune possible outliers, which exist in the training data.
Our experimental evaluations indicate that utilization
of RANSAC-based training data selection provides
66.32 % and 65.38 % unweighted average (UA) recall
rate on the development and test sets for the Sleepiness
Sub-challenge, respectively and a slight improvement on
the Intoxication Sub-challenge performance.
|
| 14:50 | University of Ljubljana System for Interspeech 2011 Speaker State Challenge
Rok Gajšek (University of Ljubljana) Simon Dobrišek (University of Ljubljana) France Mihelič (University of Ljubljana)
The paper presents our efforts in the Interspeech 2011 Speaker State Challenge. Both systems, for the Intoxication and the Sleepiness Sub-Challenge, are based on a Universal Background Model (UBM) in a form of a Hidden Markov Model (HMM) and the Maximum A Posteriori (MAP) adaptation. With the combination of our HMM-UBM-MAP derived super-vectors and selected statistical functionals from the baseline feature set, we were able to surpass the baseline system in both sub-challenges. By employing majority voting fusion of best systems we were able to further improve the performance. In the Intoxication Sub-Challenge our best result on the test set is 67.46%, and in the Sleepiness Sub-Challenge 71.28%.
|
| 15:10 | Speaker State Classification Based on Fusion of Asymmetric SIMPLS and Support Vector Machines
Dong-Yan Huang (Institute for Infocomm Research) Shuzhi Sam Ge (Social Robotics Lab, Interactive Digital Media Institute) Zhengchen Zhang (Social Robotics Lab, Interactive Digital Media Institute)
This paper describes a Speaker State Classification System (SSCS) for the INTERSPEECH 2011 Speaker State Challenge. Our SSC system for the Intoxication and Sleepiness Sub-Challenges uses fusion of several individual sub-systems. We make use of three standard feature sets per corpus given by organizers and MFCCs. Modeling is based on our own developed classification method - Asymmetric simple partial least squares (ASIMPLS) and Support Vector Machines (SVMs), followed by the calibration and multiple fusion methods. The advantage of asymmetric SIMPLS is prone to protect the minority class from being misclassified and boosts the performance on the majority class. Our experimental results show that our SSC system performs better than baseline system. Our final fusion results in 1.8% absolute
improvement on the unweighted accuracy value for the Alcohol Language Corpus (ALC) and about 0.7% for the Sleepy Language Corpus (SLC) on the development set over the baseline. On the test set, we obtain 1.1% and 1.4 % absolute improvement, respectively.
|
|
|