Files in this item



application/pdfHUANG-THESIS-2021.pdf (10MB)
(no description provided)PDF


Title:Improving utilization, granularity, and interpretability in visual representation learning
Author(s):Huang, Edward Z
Advisor(s):Wang, Yuxiong
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):computer vision
style transfer
deep learning
representation learning
Abstract:This thesis presents three works that revolve around improving the learning and usage of deep model features in computer vision. The first work is about improving style transfer, which is a generative artistic method that leverages pretrained deep model features. Style transfer boils down to a distribution matching problem, where the generated image must match the feature distribution of the style image within the same hidden layers of the pretrained model. To that end, we propose using statistical moments as metrics for assessing distribution matching. Current style transfer methods match the feature distributions using second-order statistics, which has two major limitations: 1.) they cannot match the third or higher-order moments, 2.) they cannot match the non-linear relationships between the dimensions. We propose two new methods of style transfer that address both of these limitations respectively, and significantly increase the quality in the mid-level and high-level textures of the style transfer. The second work is a semi-supervised contrastive learning method we call \textit{hierarchical contrastive learning}. The essence of contrastive learning is to differentiate between pairs of images that are deemed similar or not. There is much literature that shows contrastive learning helps deep models learn a rich set of features, which are useful for downstream tasks. Our method expands this technique on a granular level. Rather than learn a binary categorization of similar or dissimilar pairs, our method trains the model to understand a hierarchy of similarities between pairs of images. We hypothesize that such a learning scheme improves the representative quality of the features. Our analysis shows that our method outperforms current self/semi-supervised methods on transfer learning from ImageNet to other image datasets. The third work improves the interpretability of the deep model features on sparse image data. We integrate a decomposition method known as shift-invariant probabilistic latent component analysis (PLCA) into deep convolutional neural nets (CNNs). Hence we call our method Deep PLCA. Intuitively, PLCA decomposes image data into local structures (kernels), and their spatial locations (latent components). Compared to PLCA, Deep PLCA achieves the same reconstruction performance, and also has two key advantages: 1.) it generalizes to unseen data, 2.) it converges faster. All three works are open-sourced on GitHub and can be viewed through the following links: \url{}, \url{}, \url{}.
Issue Date:2021-04-26
Rights Information:Copyright 2021 Edward Z. Huang
Date Available in IDEALS:2021-09-17
Date Deposited:2021-05

This item appears in the following Collection(s)

Item Statistics