Dealing with linguistic mismatches for automatic speech recognition

Yang, Xuesong

Dealing with linguistic mismatches for automatic speech recognition

Yang, Xuesong

Permalink

https://hdl.handle.net/2142/105187

Description

Title

Dealing with linguistic mismatches for automatic speech recognition

Author(s)

Yang, Xuesong

Issue Date

2019-04-15

Director of Research (if dissertation) or Advisor (if thesis)

Hasegawa-Johnson, Mark

Doctoral Committee Chair(s)

Hasegawa-Johnson, Mark

Committee Member(s)

Huang, Thomas S.
Smaragdis, Paris
Shih, Chilin

Department of Study

Graduate College Programs

Discipline

Informatics

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Date of Ingest

2019-08-23T20:44:44Z

Keyword(s)

Automatic Speech Recognition
Acoustic Modeling
Multi-Accents
Multi-Lingual
Acoustic Phonetics
Distinctive Features
Acoustic Landmarks
End-to-End
Multi-Task Learning
Model Compression
Deep Learning
Pronunciation Error Detection
Connectionist Temporal Classification

Abstract

Recent breakthroughs in automatic speech recognition (ASR) have resulted in a word error rate (WER) on par with human transcribers on the English Switchboard benchmark. However, dealing with linguistic mismatches between the training and testing data is still a significant challenge that remains unsolved. Under the monolingual environment, it is well-known that the performance of ASR systems degrades significantly when presented with the speech from speakers with different accents, dialects, and speaking styles than those encountered during system training. Under the multi-lingual environment, ASR systems trained on a source language achieve even worse performance when tested on another target language because of mismatches in terms of the number of phonemes, lexical ambiguity, and power of phonotactic constraints provided by phone-level n-grams. In order to address the issues of linguistic mismatches for current ASR systems, my dissertation investigates both knowledge-gnostic and knowledge-agnostic solutions. In the first part, classic theories relevant to acoustics and articulatory phonetics that present capability of being transferred across a dialect continuum from local dialects to another standardized language are re-visited. Experiments demonstrate the potentials that acoustic correlates in the vicinity of landmarks could help to build a bridge for dealing with mismatches across difference local or global varieties in a dialect continuum. In the second part, we design an end-to-end acoustic modeling approach based on connectionist temporal classification loss and propose to link the training of acoustics and accent altogether in a manner similar to the learning process in human speech perception. This joint model not only performed well on ASR with multiple accents but also boosted accuracies of accent identification task in comparison to separately-trained models.

Graduation Semester

2019-05

Type of Resource

text

Permalink

http://hdl.handle.net/2142/105187

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dealing with linguistic mismatches for automatic speech recognition

Yang, Xuesong

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Informatics

Log In