Files in this item



application/pdfSong_Xie.pdf (11MB)
(no description provided)PDF


Title:Removing redundancy in speech by modeling forward masking
Author(s):Xie, Song
Advisor(s):Allen, Jont B.
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Speech Recognition
Forward Masking
Perceptual Feature
Abstract:Researchers have been working on fundamental phoneme decoding since the 1920s. There are two different approaches to this, the first being automatic speech recognition (ASR). Even though many state-of-the-art analysis methods, such as LPC, STFT, and MFCC, have been used in speech recognition, the performance of ASR has reached a plateau and the speech decoding problem remains unresolved. Recently, the Human Speech Recognition group (HSR) of the University of Illinois conducted research aimed at improving our understanding of human speech recognition (HSR). Based on the Articulation Index theory, they first developed a tool named the AI-gram which can display the audible components of speech sounds. Using this tool, they discovered that a small set of speech features in the AI-gram which they named primary cues can account for the speech sound identification. And they proposed a method which they called the three-dimensional deep search (3DDS), to extract those primary cues. However, speech masking, especially forward masking, should be considered and modeled when we extract the primary cues. In this thesis, we propose a forward masking model and integrate it into the AI-gram. The forward masking model consists of an RC feedback loop, a comparison operator and a delay. For every speech input, our model will multiply it with a frequency dependent gain map which represents the current status of the cochlea outer hair cells (OHC) to obtain the output. This gain map modifies the AI-gram according to the forward masking model. We conduct two simulations to verify the model. In the first simulation, we modify speech sounds according the forward masking model at SNR=12 dB. In the second experiment, we modify the f103 /tA/ at SNR= 15, 6, 0, -3 dB. What we observe in these simulations is that, while onsets are preserved, a large amount of energy in the AI-gram is removed. We then listen and compare the original and modified speech sounds. The result shows that there are only subtle differences in quality of the modified sounds. The obvious conclusion is that the forward masking model is doing a good job at removing the masked speech features. One might logically conclude from these simulations that the FM model is removing redundancy in the AI-gram that is naturally masked by the cochlea.
Issue Date:2014-01-16
Rights Information:Copyright 2013 Song Xie
Date Available in IDEALS:2014-01-16
Date Deposited:2013-12

This item appears in the following Collection(s)

Item Statistics