|
12thAnnual Conference of the
International Speech Communication Association
|
sponsors
|
Interspeech 2011 Florence |
Technical Programme
This is the final programme for this session. For oral sessions, the timing on the left is the current presentation order, but this may still change, so please check at the conference itself. If you have signed in to My Schedule, you can add papers to your own personalised list.
Tue-Ses2-S1-P: Spoken Language Processing of Human-Human Conversations II
| Time: | Tuesday 14:30 |
Place: | Caravaggio (Adua 1) - Pala Affari - 1st Floor |
Type: | Poster |
| Chair: | Dilek Hakkani-Tur |
| #1 | Learning Influences from Word Use in Polylogue
Tomoharu Iwata (NTT) Shinji Watanabe (NTT)
We propose a probabilistic model for estimating influences among speakers from conversation data with multiple people. In conversations, people tend to mimic their companions' behavior depending on their level of trust. With the proposed model, we assume that the word use of a speaker depends on the word use of previous speakers as well as their own earlier word use and the general word distribution. The influences can be efficiently estimated by using the expectation maximization (EM) algorithm. Experiments on two meeting data sets in Japanese and in English demonstrate the effectiveness of the proposed method.
|
| #2 | Identifying Agreement/Disagreement in Conversational Speech: A Cross-lingual Study
Wen Wang (SRI International) Kristin Precoda (SRI International) Colleen Richey (SRI International) Geoffrey Raymond (University of California, Santa Barbara)
This paper presents models for detecting
agreement/disagreement between speakers in English and Arabic broadcast conversation shows.
We explore a variety of features, including
lexical, structural, durational, and prosodic features. We
experiment these features using Conditional Random
Fields models and conduct systematic
investigations on efficacy of various feature groups across languages. Sampling approaches are examined for handling highly imbalanced data.
Overall, we achieved 79.2 pct. (precision), 50.5 pct.
(recall), 61.7 pct. (F1) for agreement detection and 69.2 pct. (precision),
46.9 pct. (recall), and 55.9 pct. (F1) for disagreement detection, on
English broadcast conversation data; and 89.2 pct. (precision), 30.1 pct. (recall), 45.1 pct. (F1) for agreement
detection and 75.9 pct. (precision), 28.4 pct. (recall), and 41.3 pct. (F1) for disagreement detection, on Arabic
broadcast conversation data.
|
| #3 | A Dual Channel Coupled Decoder for Fillers and Feedback
Daniel Neiberg (CTT, TMH, CSC, KTH) Joakim Gustafson (CTT, TMH, CSC, KTH)
This study presents a dual channel decoder capable of modeling cross-speaker dependencies for segmentation and classification of fillers and feedbacks in conversational speech found in the DEAL corpus. For the same number of Gaussians per state, we have shown improvement in terms of average F-score for the successive addition of 1) increased frame rate from 10 ms to 50 ms 2) Joint Maximum Cross-Correlation (JMXC) features in a single channel decoder 3) a joint transition matrix which captures dependencies symmetrically across the two channels 4) coupled acoustic model retraining symmetrically across the two channels. The final step gives a relative improvement of over 100% for fillers and feedbacks compared to our previous published results. The F-scores are in the range to make it possible to use the decoder as both a voice activity detector and an illucotary act decoder for semi-automatic annotation.
|
| #4 | An Analysis of PCA-based Vocal Entrainment Measures in Married Couples\' Affective Spoken Interactions
Chi-Chun Lee (Signal Analysis and Interpretation Laboratory, University of Southern California) Athanasios Katsamanis (Signal Analysis and Interpretation Laboratory, University of Southern California) Matthew P. Black (Signal Analysis and Interpretation Laboratory, University of Southern California) Brian R. Baucom (Department of Psychology, University of Southern California) Panayiotis G. Georgiou (Signal Analysis and Interpretation Laboratory, University of Southern California) Shrikanth S. Narayanan (Signal Analysis and Interpretation Laboratory, University of Southern California)
Entrainment has played a crucial role in analyzing marital couples interactions. In this work, we introduce a novel technique for quantifying vocal entrainment based on Principal Component Analysis (PCA). The entrainment measure, as we define in this work, is the amount of preserved variability of one interlocutor’s speaking characteristic when projected onto representing space of the other’s speaking characteristics. Our analysis on real couples’ interactions shows that when a spouse is rated as having positive emotion, he/she has a higher value of vocal entrainment compared when rated as having negative emotion. We further performed various statistical analyses on the strength and the directionality of vocal entrainment under different affective interaction conditions to bring quantitative insights into the entrainment phenomenon. These analyses along with a baseline prediction model demonstrate the validity and utility of the proposed PCA-based vocal entrainment measure.
|
|
|