Files in this item

FilesDescriptionFormat

application/pdf

application/pdfKHORRAMI-DISSERTATION-2017.pdf (7MB)
(no description provided)PDF

Description

Title:How deep learning can help emotion recognition
Author(s):Khorrami, Pooya Rezvani
Director of Research:Huang, Thomas S
Doctoral Committee Chair(s):Huang, Thomas S
Doctoral Committee Member(s):Hasegawa-Johnson, Mark; Hoiem, Derek W; Liang, Zhi-Pei
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:Ph.D.
Genre:Dissertation
Subject(s):Emotion recognition
Deep learning
Machine learning
Computer vision
Facial expression recognition
Affective computing
Deep neural networks
Abstract:As technological systems become more and more advanced, the need for including the human during the interaction process has become more apparent. One simple way is to have the computer system understand and respond to the human's emotions. Previous works in emotion recognition have focused on improving performance by incorporating domain knowledge into the underlying system either through pre-specified rules or hand-crafted features. However, in the last few years, learned feature representations have experienced a resurgence mainly due to the success of deep neural networks. In this dissertation, we highlight how deep neural networks, when applied to emotion recognition, can learn representations that not only achieve superior accuracy to hand-crafted techniques, but also align with previous domain knowledge. Moreover, we show how these learned representations can generalize to different definitions of emotions and to different input modalities. The first part of this dissertation considers the task of categorical emotion recognition on images. We show how a convolutional neural network (CNN) that achieves state-of-the-art performance can also learn features that strongly correspond to Facial Action Units (FAUs). In the second part, we focus our attention on emotion recognition in video. We take the image-based CNN model and combine it with a recurrent neural network (RNN) in order to do dimensional emotion recognition. We also visualize the portions of the faces that most strongly affect the output prediction by using the gradient as a saliency map. Lastly, we explore the merit of doing multimodal emotion recognition by combining our model with other models trained on audio and physiological data.
Issue Date:2017-03-29
Type:Thesis
URI:http://hdl.handle.net/2142/97284
Rights Information:Copyright 2017 Pooya Rezvani Khorrami
Date Available in IDEALS:2017-08-10
Date Deposited:2017-05


This item appears in the following Collection(s)

Item Statistics