Direct exposure of linguistic principles in model architectures for downstream processing
Morshed, Mahir Abrar
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/130021
Description
Title
Direct exposure of linguistic principles in model architectures for downstream processing
Author(s)
Morshed, Mahir Abrar
Issue Date
2025-07-17
Director of Research (if dissertation) or Advisor (if thesis)
Hasegawa-Johnson, Mark A
Doctoral Committee Chair(s)
Hasegawa-Johnson, Mark A
Committee Member(s)
Smaragdis, Paris
Tang, Yan
Singer, Andrew C
Varshney, Lav R
Department of Study
Electrical & Computer Eng
Discipline
Electrical & Computer Engr
Degree Granting Institution
University of Illinois Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
low-resource speech recognition
articulatory feature detection
over-regularized speech
structured text generation
Abstract
Extensive work on models processing text and speech has led to massively multilingual tools targeting often lower-resourced languages. Examining those tools’ architectures, however, has often revealed little use of specific linguistic principles in a directly explainable fashion, whether in model setup, training, or evaluation. This work seeks to further the inclusion of linguistic information within the structure of systems that process language, whether for text generation or speech recognition. Systems demonstrating differing levels of system restructuring to expose such information include, at one end, a novel text generation system using discrete semantic and syntactic units, built on an open knowledge base and community contributions of code and data. A foundation speech recognizer fine-tuned using morphemic units to handle over-regularized children’s speech serves as the other end of the restructuring spectrum. Following these are improvements to multilingual phone recognition systems through the use and transfer of articulatory information, as a means of imparting some explainability to those systems. Reductions in phone error rates have originated in diversified training corpora to improve language and phone coverage, the introduction of self-supervised waveform input processing, and adjustments to universal phone and feature sets for consistency and brevity. Such efforts highlight the importance of introducing explicit linguistic decompositions, whether phonological, morphological, or syntactic, to models that process language, and the need for continual improvements to sources providing those decompositions.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.