Files in this item



application/pdfZHU-THESIS-2017.pdf (1MB)
(no description provided)PDF


Title:Lipreading with convolutional and recurrent neural network models
Author(s):Zhu, Tianyilin
Advisor(s):Hasegawa-Johnson, Mark
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Convolutional neural network
Abstract:Lip reading is the process of speech recognition from solely visual information. The goal of this thesis is to perform a silence vs. speech classification, and to recognize the triphone spoken by a talking head, given only the video using neural network classification models. Two neural network architectures are developed and tested on the AVICAR dataset, including one convolutional neural network (CNN) model with fully connected classification layer, and one recurrent neural network (RNN) model with convolutional layer and one long short-term memory (LSTM) layer to perform the classification on a sequence of input. In both models, the convolutional layers serve as feature extractors. The performance of each model is experimentally evaluated and the detailed network structure and preprocessing pipeline are demonstrated.
Issue Date:2017-04-24
Rights Information:Copyright 2017 Tianyilin Zhu
Date Available in IDEALS:2017-08-10
Date Deposited:2017-05

This item appears in the following Collection(s)

Item Statistics