End-to-end modeling for code-switching automatic speech recognition

Zhang, Feiyu

End-to-end modeling for code-switching automatic speech recognition

Zhang, Feiyu

Permalink

https://hdl.handle.net/2142/124616

Description

Title

End-to-end modeling for code-switching automatic speech recognition

Author(s)

Zhang, Feiyu

Issue Date

2024-05-01

Director of Research (if dissertation) or Advisor (if thesis)

Hasegawa-Johnson, Mark

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

M.S.

Degree Level

Thesis

Keyword(s)

Speech Recognition
Code-switching
End-to-end
Embeddings

Language

eng

Abstract

The end-to-end deep neural networks have been the start-of-the-art architecture for many tasks in the field of Automatic Speech Recognition (ASR). However, for code-switched speech, the persistent challenge of dataset scarcity is still a major problem. Given the difficulty in collecting code-switched corpus, it is noticeable that such deep neural network based systems is usually hard to reach high accuracy compared with mono-lingual ASR systems. In this study, we present an simple yet efficient end-to-end ASR system utilizing attention based encoder-decoder framework, specifically engineered to address the complexities of code-switched speech on a English and Mandarin code-switched dataset. To overcome the dataset constraints, our approach leverages attention mechanisms, enhancing the model's ability to focus on relevant linguistic features across different languages. We integrate BERT-multilingual and wav2vec 2.0 models to enrich the system's language understanding and acoustic processing capabilities. These integrations allow the model to capture nuanced language variations and phonetic subtleties inherent in code-switched speech. The results indicate a relatively low Mixed Error Rate (MER), demonstrating the model's effectiveness in decoding complex code-switched speech. Our findings shows that combining neural network architectures with sophisticated language models improves ASR systems' adaptability in multilingual settings. We also discuss the potential of incorporating syntax knowledge into language models to leverage linguistic information.

Graduation Semester

2024-05

Type of Resource

Text

Handle URL

https://hdl.handle.net/2142/124616

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Electrical and Computer Engineering

Dissertations and Theses in Electrical and Computer Engineering

End-to-end modeling for code-switching automatic speech recognition

Zhang, Feiyu

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Electrical and Computer Engineering

Log In