Withdraw
Loading…
Explainable artificial intelligence for inclusive automatic speech recognition
Lee, Seunghyun
Loading…
Permalink
https://hdl.handle.net/2142/121386
Description
- Title
- Explainable artificial intelligence for inclusive automatic speech recognition
- Author(s)
- Lee, Seunghyun
- Issue Date
- 2023-07-21
- Director of Research (if dissertation) or Advisor (if thesis)
- Hasegawa-Johnson, Mark A
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Inclusive ASR
- Explainable AI
- ASR Visualization
- Abstract
- While the widespread adoption of automatic speech recognition (ASR) technology has brought significant benefits to society, it has also highlighted a persistent issue of inequality in access and utilization of technology. Furthermore, in response to the increasing prevalence of artificial intelligence applications, there has been a growing demand for explainable artificial intelligence (XAI). To address the need for interpretability and explainability in ASR, particularly in the context of inclusiveness, this paper aims to visualize the inner workings of the convolutional neural network (CNN) layer and Transformer block in Wav2Vec2.0. This is achieved by calculating the weighted relevance of the connectionist temporal classification (CTC) with respect to the attention and convolutional layers. Leveraging a Wav2Vec2.0 model pre-trained and fine-tuned on LibriSpeech, and testing the model using the Speech Accent Archive, we discovered that the Transformer exhibits a focus on other vowel transcriptions when encountering vowels within a word, whereas it exhibits a more localized attention when transcribing consonants or vowels in non-words absent from its learned vocabulary. Analysis of the weighted convolutional relevance in the first layer of the CNN revealed that different channels concentrate on distinct frequency and time sequences to capture the overall input characteristics. By obtaining a comprehensive understanding of the underlying causes and dynamics behind performance disparities, we can strive to mitigate these disparities and promote a more inclusive ASR technology.
- Graduation Semester
- 2023-08
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/121386
- Copyright and License Information
- Copyright 2023 Seunghyun Lee
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…