Files in this item

FilesDescriptionFormat

application/pdf

application/pdfHAN-DISSERTATION-2019.pdf (2MB)Restricted to U of Illinois
(no description provided)PDF

Description

Title:Learning compact neural network representations with structural priors
Author(s):Han, Wei
Director of Research:Huang, Thomas S.
Doctoral Committee Chair(s):Huang, Thomas S.
Doctoral Committee Member(s):Hasegawa-Johnson, Mark; Liang, Zhi-Pei; Hwu, Wen-Mei
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:Ph.D.
Genre:Dissertation
Subject(s):neural networks
recurrent neural networks
convolutional neural networks
Abstract:The development of deep neural networks has taken two directions. On one hand, the networks become deeper and wider, employing drastically more model parameters and consuming more training data. On the other hand, the simplification of the internal structure of the neural networks has also contributed to the success of neural networks. Notably, two important families of neural networks, convolutional neural networks (CNN) and recurrent neural networks (RNN), both introduce certain structural priors and share model parameters internally to simplify the network. In this dissertation, we investigate a few alternative neural network structural priors to learn more compact CNN and RNN models and achieve better parameter-to-performance ratios. We have developed these neural network structures at three different abstraction levels with different target applications for each. First of all, motivated by the filter redundancy in convolutional neural networks, we have studied parameter sharing across filters in a convolutional layer. Instead of the conventional approach treating CNN filters as a set of independent model parameters, we explore the 2D spatial correlation among filters and propose sharing the filter parameters as if they are overlapping slices from a shared 3D tensor. Experiments show the proposed approach can effectively reduce the number of parameters in several state-of-the-art CNN architectures while still maintaining competitive performance. The second problem studied in this dissertation concerns the inter-layer connectivity pattern in RNNs, the family of neural network models specifically designed for time-series data. One longstanding challenge in RNNs is the vanishing gradient problem that hinders the model's capability to learn long-term dependency, the temporal data dependency across many time steps. Motivated by the recent development from two largely unrelated fields---general application of skip-connection to neural networks, and dilated convolution in image and audio related problems---we propose \textsc{DilatedRNN}, a simple but principled way to construct multi-layer RNNs using multi-resolution recurrent skip connections. The proposed method is conceptually similar to dilated convolution but takes full advantage of the modeling power of RNNs. We show that \textsc{DilatedRNN} is effective particularly in the problems where long-term dependencies are crucial. Finally, inspired by the structural similarity between CNNs and unrolled single layer RNNs, we also study the parameter sharing across different network layers that may be sequentially connected. Specifically, we focus on the family of feedforward CNNs that have an equivalent RNN form and tie the layer parameters with respect to the standard RNN unrolling rule. Empirically we have found that this family of models not only provides desirable balance between model complexity and performance, but also leads to some novel architecture that can be easily combined with domain knowledge.
Issue Date:2019-04-18
Type:Text
URI:http://hdl.handle.net/2142/105057
Rights Information:Copyright 2019 Wei Han
Date Available in IDEALS:2019-08-23
Date Deposited:2019-05


This item appears in the following Collection(s)

Item Statistics