Files in this item



application/pdfSP21-ECE499-Thesis-Chan, Chak Ho.pdf (532kB)Restricted to U of Illinois
(no description provided)PDF


Title:SpeechSplit2: Disentangling Speech Information Streams without Exhaustive Bottleneck Fine-tuning
Author(s):Chan, Chak Ho
Contributor(s):Hasegawa-Johnson, Mark
Degree:B.S. (bachelor's)
Subject(s):Voice Conversion
Speech Disentanglement
Signal Processing
Abstract:SpeechSplit is among the first algorithms that successfully disentangle speech into four components: rhythm, content, pitch, and timbre. However, the model requires exhaustive fine-tuning of the bottleneck dimensions of the encoders, which can be a daunting task and limits its generalization ability. In this work, we propose SpeechSplit2, an improved version of SpeechSplit, in which simple signal processing methods are utilized to alleviate the laborious bottleneck fine-tuning problem. We show that by feeding different inputs to each encoder, we can control the input space to the neural networks so that each component only contains the information that we desire to extract, given the bottleneck size is sufficiently large to encode the corresponding information. With the same neural network architecture as SpeechSplit, SpeechSplit2 achieves comparable performance in disentangling speech components when the bottlenecks are carefully fine-tuned and shows superior advantage over the baseline when the bottleneck size varies.
Issue Date:2021-05
Genre:Dissertation / Thesis
Date Available in IDEALS:2021-08-12

This item appears in the following Collection(s)

Item Statistics