Withdraw
Loading…
Learning shared semantic space for speech-to-text translation
Han, Chi
Loading…
Permalink
https://hdl.handle.net/2142/120363
Description
- Title
- Learning shared semantic space for speech-to-text translation
- Author(s)
- Han, Chi
- Issue Date
- 2023-04-13
- Director of Research (if dissertation) or Advisor (if thesis)
- Ji, Heng
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Keyword(s)
- Speech-to-Text Translation
- Natural Language Processing
- Representation Learning
- Abstract
- End-to-end speech translation (ST) has far-reaching implications and numerous potential applications, making it an area of significant interest and impact. Despite its importance, ST has traditionally been treated as a separate task, failing to fully leverage the rapid ad- vancements in its closely related sibling - text machine translation (MT). This separation is due to the modality gap, which results from the different representations of text and audio inputs, rendering MT data and end-to-end models incompatible with their ST counterparts. In light of this challenge, we present Chimera, a novel approach designed to bridge the rep- resentation gap between these two modalities. Chimera achieves this by projecting audio and text features onto a common semantic representation, effectively unifying the MT and ST tasks. Consequently, Chimera enhances the performance on ST benchmarks, such as MuST-C and Augmented Librispeech, setting new state-of-the-art results. More specifically, Chimera attains a 27.1 BLEU score on the MuST-C EN-DE benchmark, improving the existing state-of-the-art by a substantial margin of +1.9 BLEU. Further experimental anal- yses substantiate that the shared semantic space indeed facilitates the exchange of common knowledge between the MT and ST tasks. We discovered identifiable semantic regions within the shared joint speech-text encoding space, highlighting the effective integration of both modalities. By plotting neural activation maps between parallel speech and text, we were able to visualize the convergence of semantic information, further demonstrating the success of our approach in bridging the modality gap and fostering a more robust understanding of the underlying linguistic structures. This finding paves the way for augmenting training resources across modalities and opens up new avenues for exploration in the field of speech translation.
- Graduation Semester
- 2023-05
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/120363
- Copyright and License Information
- Copyright 2023 Chi Han
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…