Dante - Di Michelino 150° sponsors







Corporate & Society Sponsors
Loquendo diamond package
Nuance gold package
ATT bronze package
Google silver package
Appen bronze package
Appen bronze package
Interactive Media bronze package
Microasoft bronze package
SpeechOcean bronze package
Avios logo package
NDI logo package
NDI logo package

CNR-ISTC

CNR-ISTC
Universit柤e Avignon
Speech Cycle
AT&T
Universit�i Firenze
FUB
FBK
Univ. Trento
Univ. Napoli
Univ. Tuscia
Univ. Calabria
Univ. Venezia

AISV
AISV

AISV
AISV
Comune di Firenze
Firenze Fiera
Florence Convention Bureau

ISCA

12thAnnual Conference of the
International Speech Communication Association

Sponsors
sponsors

Interspeech 2011 Florence

Technical Programme

This is the final programme for this session. For oral sessions, the timing on the left is the current presentation order, but this may still change, so please check at the conference itself. If you have signed in to My Schedule, you can add papers to your own personalised list.

Wed-Ses1-S1:
Speaker State Challenge - Intoxication and Sleepiness I

Time:Wednesday 10:00 Place:Raffaello - Pala Affari - 3rd Floor Type:Oral
Chair:Bjoern Schuller

10:00The INTERSPEECH 2011 Speaker State Challenge

Björn Schuller (Technische Universität München)
Stefan Steidl (ICSI)
Anton Batliner (FAU Erlangen-Nuremberg)
Florian Schiel (BAS, Ludwig-Maximilians-Universität München)
Jarek Krajewski (University of Wuppertal)

While the first open comparative challenges in the field of paralinguistics targeted more conventional' phenomena such as emotion, age, and gender, there still exists a multiplicity of not yet covered, but highly relevant speaker states and traits. The INTERSPEECH 2011 Speaker State Challenge thus addresses two new sub-challenges to overcome the usually low compatibility of results: In the Intoxication Sub-Challenge, alcoholisation of speakers has to be determined in two classes; in the Sleepiness Sub-Challenge, another two-class classification task has to be solved. This paper introduces the conditions, the Challenge corpora "Alcohol Language Corpus" and "Sleepy Language Corpus", and a standard feature set that may be used. Further, baseline results are given.

10:20Combining Multiple Phoneme-based Classifiers with Audio Feature-based Classifier for the Detection of Alcohol Intoxication

Claude Montacié (STIH Laboratory, Paris Sorbonne University, France)
Marie-José Caraty (LIPADE laboratory, Paris Descartes University, France)

This article describes the two systems which we submitted for the Intoxication Sub-Challenge of INTERSPEECH 2011 Speaker State Challenge. At first, we developed an extended Baseline System with a significant improvement of the unweigthed accuracy compared to the Official Baseline System (OBS) on the development set. Then, we investigated the phonetic variations of speech under alcoholisation and developed gender-dependent Phoneme-based SVM classifiers. For this purpose, we selected the most relevant phonemes and investigated a system combining six Phoneme-based SVM classifiers. Its results in accuracy are slightly below the OBS results. Finally, the combination of the two systems is presented.

10:40Intoxication Detection using Phonetic, Phonotactic and Prosodic Cues

Fadi Biadsy (Computer Science Department, Columbia University, New York, USA)
William Yang Wang (Computer Science Department, Columbia University, New York, USA)
Andrew Rosenberg (Computer Science Department, Queens College (CUNY), New York, USA)
Julia Hirschberg (Computer Science Department, Columbia University, New York, USA)

In this paper, we investigate multiple approaches for automatically detecting intoxicated speakers given samples of their speech. Intoxicated speech in a given language can be viewed simply as a different accent of this language; therefore we adopt our recent approach to dialect and accent recognition to detect intoxication. The system models phonetic structural differences across sober and intoxicated speakers. This approach employs SVM with a kernel function that computes similarities between adapted phone GMMs which summarize speakers' phonetic characteristics in their utterances. We also investigate additional cues, such as prosodic events, phonotactics and phonetic durations under intoxicated and sober conditions. We find that our phonetic-based system when combined with phonotactic features provides us with our best result on the official development set, an accuracy of 73% and an equal error rate of 26.3%, significantly higher than the official baseline.

11:00Drink and Speak: On the automatic classification of alcohol intoxination by acoustic, prosodic and text-based features

Tobias Bocklet (University of Erlangen-Nuremberg)
Korbinian Riedhammer (University of Erlangen-Nuremberg)
Elmar Nöth (University of Erlangen-Nuremberg)

This paper focuses on the automatic detection of a person’s blood alcohol based on automatic speech processing approaches. We compare different feature sets on the ALC dataset of Interspeech2011 speaker state challenge. Three feature sets are based on spectral observations: TRAPS, MFCC, and PLP. These are modeled by GMMs. Classification is either done by a Gaussian classifier or by SVMs. A prosodic system extracts a 292-dimensional feature vector. Transcription-based systems employ features from the available transcription. We compare the stand-alone performances of these systems and combine them on score level. Combination on score achieved a significant improvement of 15% on development set. On test-set we achieved an UA of 68.63% which is a significant relative improvement of more than 5% compared to the baseline system.

11:20Intoxicated Speech Detection Using Hierarchical Features and Iterative Speaker Normalization

Daniel Bone (Signal Analysis and Interpretation Laboratory (SAIL), Los Angeles, CA, USA)
Matthew P. Black (Signal Analysis and Interpretation Laboratory (SAIL), Los Angeles, CA, USA)
Ming Li (Signal Analysis and Interpretation Laboratory (SAIL), Los Angeles, CA, USA)
Angeliki Metallinou (Signal Analysis and Interpretation Laboratory (SAIL), Los Angeles, CA, USA)
Sungbok Lee (Signal Analysis and Interpretation Laboratory (SAIL), Los Angeles, CA, USA)
Shrikanth S. Narayanan (Signal Analysis and Interpretation Laboratory (SAIL), Los Angeles, CA, USA)

Speaker state recognition is a challenging problem due to speaker and context variability. Intoxication detection is an important area of paralinguistic speech research with potential real-world applications. In this work, we build upon a base set of various static acoustic features by proposing the combination of several different methods for this learning task. The methods include extracting hierarchical acoustic features, performing iterative speaker normalization, and using a set of GMM supervectors. We obtain an optimal unweighted recall for intoxication recognition using score-level fusion of these subsystems. Unweighted average recall performance is 70.54% on the test set, an improvement of 4.64% absolute (7.04% relative) over the baseline model accuracy of 65.9%.

11:40Attention, Sobriety Checkpoint! Can Humans Determine by Means of Voice, if Someone is Drunk... and can Automatic Classifiers Compete?

Stefan Ultes (Institute of Information Technology, University of Ulm, Germany)
Alexander Schmitt (Institute of Information Technology, University of Ulm, Germany)
Wolfgang Minker (Institute of Information Technology, University of Ulm, Germany)

This paper analyzes the human performance of recognizing drunk speakers merely by voice and compares the results with the performance of an automatic statistical classifier. The study is carried out within the Interspeech 2011 Speaker State Challenge employing the Alcohol Language Corpus (ALC). The 79 subjects yielded an average performance of 55.8% unweighted accuracy on a balanced intoxicated/non-intoxicated sample set. The statistical classifier developed in this study reaches a performance of 66.6% unweighted accuracy on the test set. In comparison, the subject with the highest performance yielded 70.0%. Our classifier is based on 4368 acoustic and prosodic features. Incorporating linguistic features along with feature selection using Information Gain Ratio (IGR) ranking added 0.7% absolute improvement with resulting in a 29% smaller feature space size.

12:00Does it Groove or Does it Stumble - Automatic Classification of Alcoholic Intoxiation Using Prosodic Features

Florian Hönig (Pattern Recognition Lab, University of Erlangen-Nuremberg)
Anton Batliner (Pattern Recognition Lab, University of Erlangen-Nuremberg)

This paper studies how prosodic features can help in the automatic detection of alcoholic intoxication. We compute features that have recently been proposed to model speech rhythm such as the pair-wise variability index for consonantal and vocalic segments (PVI) and study their aptness for the task. Further, we use a large prosodic feature vector modelling the usual candidates - pitch, intensity, and duration - and apply it onto different units such as words, syllables and stressed syllables to create generalizations of the rhythm features mentioned. The results show that the prosodic features computed are suitable for detecting alcoholic intoxication and add complementary information to state-of-the-art features. The database is the intoxication database provided by the organizers of the 2011 Interspeech Speaker State Challenge.