Files in this item



application/pdfPIETROWICZ-DISSERTATION-2017.pdf (10MB)
(no description provided)PDF


Title:Exposing the hidden vocal channel: Analysis of vocal expression
Author(s):Pietrowicz, Mary B.
Director of Research:Karahalios, Karrie; Hasegawa-Johnson, Mark
Doctoral Committee Chair(s):Karahalios, Karrie; Hasegawa-Johnson, Mark
Doctoral Committee Member(s):Hockenmaier, Julia; McDonough, Jerome; Cole, Jennifer S.; Levow, Gina-Anne
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Signal processing
Vocal expression
Voice quality
Dimensional analysis
Voice analytics
Human-computer interaction (HCI)
Oral history
Speech processing
Abstract:This dissertation explored perception and modeling of human vocal expression, and began by asking what people heard in expressive speech. To address this fundamental question, clips from Shakespearian soliloquy and from the Library of Congress Veterans Oral History Collection were presented to Mechanical Turk workers (10 per clip); and the workers were asked to provide 1-3 keywords describing the vocal expression in the voice. The resulting keywords described prosody, voice quality, nonverbal quality, and emotion in the voice, along with the conversational style, and personal qualities attributed to the speaker. More than half of the keywords described emotion, and were wide-ranging and nuanced. In contrast, keywords describing prosody and voice quality reduced to a short list of frequently-repeating vocal elements. Given this description of perceived vocal expression, a 3-step process was used to model vocal qualities which listeners most frequently perceived. This process included 1) an interactive analysis across each condition to discover its distinguishing characteristics, 2) feature selection and evaluation via unequal variance sensitivity measurements and examination of means and 2-sigma variances across conditions, and 3) iterative, incremental classifier training and validation. The resulting models performed at 2-3.5 times chance. More importantly, the analysis revealed a continuum relationship across whispering, breathiness, modal speech, and resonance, and revealed multiple spectral sub-types of breathiness, modal speech, resonance, and creaky voice. Finally, latent semantic analysis (LSA) applied to the crowdsourced keyword descriptors enabled organic discovery of expressive dimensions present in each corpus, and revealed relationships among perceived voice qualities and emotions within each dimension and across the corpora. The resulting dimensional classifiers performed at up to 3 times chance, and a second study presented a dimensional analysis of laughter. This research produced a new way of exploring emotion in the voice, and of examining relationships among emotion, prosody, voice quality, conversation quality, personal quality, and other expressive vocal elements. For future work, this perception-grounded fusion of crowdsourcing and LSA technique can be applied to anything humans can describe, in any research domain.
Issue Date:2017-12-04
Rights Information:Copyright 2017 Mary Pietrowicz
Date Available in IDEALS:2018-03-13
Date Deposited:2017-12

This item appears in the following Collection(s)

Item Statistics