Files in this item



application/pdfECE499-Sp2018-wang-Liming.pdf (1MB)Restricted to U of Illinois
(no description provided)PDF


Title:Multimodal semantic learning with context-correlated speeches and images
Author(s):Wang, Liming
Contributor(s):Hasegawa-Johnson, Mark
Subject(s):speech-to-image retrieval
multimodal learning
language acquisition
under-resourced automatic speech recognition
Abstract:Automatic speech recognition (ASR) technologies have been successfully applied to most of the major languages in the world. However, ASR performs poorly with under-resourced languages such as Mboshi because those languages suffer from a lack of standardized orthographies and/or manually transcribed labels for training an ASR system. This work presents an unsupervised machine learning approach to help develop speech technology for under-resourced languages. Our algorithm imitates the human early language acquisition (LA) process using speech and context-correlated images. A comparison between different speech features was made and better features than vanilla mel frequency cepstral coefficients (MFCC) were found for our multimodal speech-to-image (Sp2Im) retrieval.
Issue Date:2018-05
Date Available in IDEALS:2018-05-24

This item appears in the following Collection(s)

Item Statistics