This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Sparse representation in deep vision models
Director of Research (if dissertation) or Advisor (if thesis)
Doctoral Committee Chair(s)
Department of Study
Electrical & Computer Eng
Electrical & Computer Engr
Degree Granting Institution
University of Illinois at Urbana-Champaign
Sparse representation plays a critical role in vision problems, including generation and understanding. Image generation tasks are inherently ill-posed, where the input signal usually has insufficient information while the output has infinitely many solutions w.r.t. the same input. Thus, it is commonly believed that sparse representation is more robust to handle the considerable diversity of solutions. Image understanding also depends on invariant and robust sparse representation for various transformations, e.g., color, lighting, viewpoint, etc.
Deep neural networks extend the sparse coding-based methods from linear structure to cascaded linear and non-linear structures. However, sparsity of hidden representation in deep neural networks cannot be solved by iterative optimization as sparse coding, since deep networks are feed-forward during inference. I invented a method that can structurally enforce sparsity constraints upon hidden neurons in deep networks but also keep representation in high dimensionality. Given high-dimensional neurons, I divide them into groups along channels and allow only one group of neurons to be non-zero each time. The adaptive selection of the non-sparse group is modeled by tiny side networks upon context features. And computation is also saved when only performed on the non-zero group.
I further extended the sparse constraints to an attention mechanism. Attention mechanism is built upon paired correlation between any two pixels and needs quadratic computation cost respecting to the input size. This mutual correlation is inherently sparse, since pixels in a single image are not necessary highly correlated to most of other pixels. I proposed a method to achieve more efficient computation of attention mechanism given the sparse prior of correlation matrix.
I also investigated the sparse scene representation modeled with deep neural networks. With sparsely rendered views of a 3D scene, the proposed deep neural network approach performs spatiotemporal reconstruction of high-definition images from a novel viewpoint efficiently.