Files in this item

FilesDescriptionFormat

application/pdf

application/pdfLIU-DISSERTATION-2016.pdf (19MB)Restricted to U of Illinois
(no description provided)PDF

Description

Title:Feedback convolutional neural network in applications of computer vision
Author(s):Liu, Xianming
Director of Research:Huang, Thomas S
Doctoral Committee Chair(s):Huang, Thomas S
Doctoral Committee Member(s):Liang, Zhi-Pei; Hasegawa-Johnson, Mark; Fu, Wai-Tat
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:Ph.D.
Genre:Dissertation
Subject(s):Convolutional Neural Network
Feedback
Computer Vision
Abstract:With the development of deep neural networks, especially convolutional neural networks, computer vision tasks rely on training data to an unprecedented extent. As the network goes deeper and wider, the demand for high quality supervised training data also increases exponentially with the model complexity. Considering the difficulty in data acquisition of high quality and complete labels, the topic of weakly-supervised learning raises much attention recently in both machine learning and computer vision communities. Inspired by the evidence from cognitive research, visual attention plays a critical role in human vision system. While bottom-up selectivity bias has been well studied from both mathematical and computer vision perspectives, the top-down selectivity that wraps semantic information has been seldom analyzed in this field. In this thesis, a more computationally efficient model for building the bottom-up selectivity of images has been built, using the scale space theory. The method uses a statistical way to build hierarchical presentations of image content on different scales, and an unsupervised approach to derive a measurement of “objectness,” which is further used as region proposal for object detection, and semantic segmentation. Motivated by the “biased competition theory,” which states that 1) a visual task is highly driven by goal or task, and 2) the unrelated neuron will be suppressed during a feedback loop in the human visual cortex, a computational model of feedback mechanism has been proposed to implement the top-down selectivity, and named as "Feedback Neural Network." The proposed feedback network optimizes the high-level task's target function by performing a feedback optimization to close irrelevant neurons in convolutional neural network. Experiments show it is effective at finding the salient regions with higher accuracy and efficiency given the high-level semantic labels. This technique is further used in weakly-supervised learning in computer vision, where only partial supervision is given during training---for example, object localization, and semantic segmentation, using only categorical image level labels. We report experiments on the ImageNet object localization dataset as well as satellite image analysis dataset to demonstrate the effectiveness of the algorithm. The feedback network could also be used to improve the image classification performance, by "looking and thinking twice." Cognitive study suggests when human subjects are given more time to observe the visual signal, there is a time at which the recognition accuracy dramatically increases. By cropping the salient regions where the network learns from large-scale imagery data, the network will reduce the chance of miss-classification by eliminating background and context. We show that this strategy improves ImageNet classification by more than 1% in the top 1 result. Especially for objects of small size, the improvement is even larger. A binarized feedback optimization algorithm is also proposed in this thesis, to improve the efficiency of the feedback operation. Instead of performing an iterative stochastic gradient descent on the feedback layer, a fast approximated optimization is designed. This facilitates training with feedback of convolutional neural networks. This strategy makes neural network training put more effort into fitting active neurons and makes better convergence. Experiments also show that this handles noise in image data well by ignoring the noise component in input signals.
Issue Date:2016-11-16
Type:Thesis
URI:http://hdl.handle.net/2142/95471
Rights Information:Copyright 2016 Xianming Liu
Date Available in IDEALS:2017-03-01
Date Deposited:2016-12


This item appears in the following Collection(s)

Item Statistics