End-to-end modeling for code-switching automatic speech recognition
Zhang, Feiyu
This item's files can only be accessed by the Administrator group.
Permalink
https://hdl.handle.net/2142/124616
Description
Title
End-to-end modeling for code-switching automatic speech recognition
Author(s)
Zhang, Feiyu
Issue Date
2024-05-01
Director of Research (if dissertation) or Advisor (if thesis)
Hasegawa-Johnson, Mark
Department of Study
Electrical & Computer Eng
Discipline
Electrical & Computer Engr
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
Speech Recognition
Code-switching
End-to-end
Embeddings
Language
eng
Abstract
The end-to-end deep neural networks have been the start-of-the-art architecture for many tasks in the field of Automatic Speech Recognition (ASR). However, for code-switched speech, the persistent challenge of dataset scarcity is still a major problem. Given the difficulty in collecting code-switched corpus, it is noticeable that such deep neural network based systems is usually hard to reach high accuracy compared with mono-lingual ASR systems. In this study, we present an simple yet efficient end-to-end ASR system utilizing attention based encoder-decoder framework, specifically engineered to address the complexities of code-switched speech on a English and Mandarin code-switched dataset. To overcome the dataset constraints, our approach leverages attention mechanisms, enhancing the model's ability to focus on relevant linguistic features across different languages. We integrate BERT-multilingual and wav2vec 2.0 models to enrich the system's language understanding and acoustic processing capabilities. These integrations allow the model to capture nuanced language variations and phonetic subtleties inherent in code-switched speech. The results indicate a relatively low Mixed Error Rate (MER), demonstrating the model's effectiveness in decoding complex code-switched speech. Our findings shows that combining neural network architectures with sophisticated language models improves ASR systems' adaptability in multilingual settings. We also discuss the potential of incorporating syntax knowledge into language models to leverage linguistic information.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.