Files in this item

FilesDescriptionFormat

application/pdf

application/pdfBHARADWAJ-DISSERTATION-2015.pdf (1MB)
(no description provided)PDF

Description

Title:A theory of (almost) zero resource speech recognition
Author(s):Bharadwaj, Sujeeth Subramanya
Director of Research:Hasegawa-Johnson, Mark A.
Doctoral Committee Chair(s):Hasegawa-Johnson, Mark A.
Doctoral Committee Member(s):Levinson, Stephen E.; Liang, Feng; Smaragdis, Paris
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:Ph.D.
Genre:Dissertation
Subject(s):Speech recognition
Unsupervised learning
PAC-Bayesian theory
Language Modeling
Acoustic Event Detection
anomaly detection
Abstract:Automatic speech recognition has matured into a commercially successful technology, enabling voice-based interfaces for smartphones, smart TVs, and many other consumer devices. The overwhelming popularity, however, is still limited to languages such as English, Japanese, and German, where vast amounts of labeled training data are available. For most other languages, it is prohibitively expensive to 1) collect and transcribe the speech data required to learn good acoustic models; and 2) acquire adequate text to estimate meaningful language models. A theory of unsupervised and semi-supervised techniques for speech recognition is therefore essential. This thesis focuses on HMM-based sequence clustering and examines acoustic modeling, language modeling, and applications beyond the components of an ASR, such as anomaly detection, from the vantage point of PAC-Bayesian theory. The first part of this thesis extends standard PAC-Bayesian bounds to address the sequential nature of speech and language signals. A novel algorithm, based on sparsifying the cluster assignment probabilities with a Renyi entropy prior, is shown to provably minimize the generalization error of any probabilistic model (e.g. HMMs). The second part examines application-specific loss functions such as cluster purity and perplexity. Empirical results on a variety of tasks -- acoustic event detection, class-based language modeling, and unsupervised sequence anomaly detection -- confirm the practicality of the theory and algorithms developed in this thesis.
Issue Date:2015-03-31
Type:Thesis
URI:http://hdl.handle.net/2142/78343
Rights Information:Copyright 2015 Sujeeth Bharadwaj
Date Available in IDEALS:2015-07-22
Date Deposited:May 2015


This item appears in the following Collection(s)

Item Statistics