SpeechSplit2: An efficient unsupervised speech disentanglement model for multi-aspect voice conversion
Chan, Chak Ho
Loading…
Permalink
https://hdl.handle.net/2142/117680
Description
Title
SpeechSplit2: An efficient unsupervised speech disentanglement model for multi-aspect voice conversion
Author(s)
Chan, Chak Ho
Issue Date
2022-12-07
Director of Research (if dissertation) or Advisor (if thesis)
Hasegawa-Johnson, Mark Allan
Department of Study
Electrical & Computer Eng
Discipline
Electrical & Computer Engr
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
Unsupervised Learning
Voice Conversion
Speech Disentanglement
Signal Processing
Language
eng
Abstract
SpeechSplit is among the first algorithms that successfully disentangle speech into four components: rhythm, content, pitch, and timbre. However, the model requires exhaustive tuning of the encoder bottlenecks, which can be a daunting task and limits its generalization ability. In this work, we present SpeechSplit2, an improved version of SpeechSplit, in which simple signal processing methods are utilized to alleviate the laborious bottleneck tuning problem. We show that by feeding different inputs to each encoder, we can guide each encoder to only extract one particular aspect of speech and discard the rest, given the bottleneck size is sufficiently large to encode the corresponding information. With the same neural network architecture as SpeechSplit, SpeechSplit2 achieves comparable performance in disentangling speech components when the bottlenecks are carefully tuned and shows superior advantage over the baseline when the bottleneck size varies.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.