Files in this item

FilesDescriptionFormat

application/pdf

application/pdfQIAN-THESIS-2018.pdf (1MB)
(no description provided)PDF

Description

Title:Speech enhancement using deep dilated CNN
Author(s):Qian, Kaizhi
Advisor(s):Hasegawa-Johnson, Mark
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:M.S.
Genre:Thesis
Subject(s):speech enhancement
convolutional neural network
beamforming
Abstract:In recent years, deep learning has achieved great success in speech enhancement. However, there are two major limitations regarding existing works. First, the Bayesian framework is not adopted in many such deep-learning-based algorithms. In particular, the prior distribution for speech in the Bayesian framework has been shown useful by regularizing the output to be in the speech space, and thus improving the performance. Second, the majority of the existing methods operate on the frequency domain of the noisy speech, such as spectrogram and its variations. We propose a Bayesian speech enhancement framework, called BaWN (Bayesian WaveNet), which directly operates on raw audio samples. It adopts the recently announced WaveNet, which is shown to be effective in modeling conditional distributions of speech samples while generating natural speech. Experiments show that BaWN is able to recover clean and natural speech. Multi-channel speech enhancement with ad-hoc sensors has been a challenging task. Speech model guided beamforming algorithms are able to recover natural sounding speech, but the speech models tend to be oversimplified to prevent the inference from becoming too complicated. On the other hand, deep learning based enhancement approaches are able to learn complicated speech distributions and perform efficient inference, but they are unable to deal with variable number of input channels. Also, deep learning approaches introduce a lot of errors, particularly in the presence of unseen noise types and settings. We have therefore proposed an enhancement framework called DeepBeam, which combines the two complementary classes of algorithms. DeepBeam introduces a beamforming filter to produce natural sounding speech, but the filter coefficients are determined with the help of a monaural speech enhancement neural network. Experiments on synthetic and real-world data show that DeepBeam is able to produce clean, dry and natural sounding speech, and is robust against unseen noise.
Issue Date:2018-05-22
Type:Text
URI:http://hdl.handle.net/2142/101644
Rights Information:Copyright 2018 Kaizhi Qian
Date Available in IDEALS:2018-09-27
2020-09-28
Date Deposited:2018-08


This item appears in the following Collection(s)

Item Statistics