Withdraw
Loading…
Group theory and information theory algorithms in deep learning
Basu, Sourya
Loading…
Permalink
https://hdl.handle.net/2142/125608
Description
- Title
- Group theory and information theory algorithms in deep learning
- Author(s)
- Basu, Sourya
- Issue Date
- 2024-07-11
- Director of Research (if dissertation) or Advisor (if thesis)
- Varshney, Lav R
- Doctoral Committee Chair(s)
- Varshney, Lav R
- Committee Member(s)
- Milenkovic, Olgica
- Hasegawa-Johnson, Mark
- Cohen, Taco
- Lohit, Suhas
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Deep learning
- Group equivariance
- Geometric deep learning
- Foundation models, Large language models
- Text decoding
- Abstract
- We are witnessing an enormous growth in deep learning that impacts our day-to-day lives with applications ranging from drug discovery to coding assistants. Accelerating this growth requires optimizing the several stages of building deep learning models such as designing efficient architectures, fine-tuning pretrained models, and sampling data from these models. We use group theory for designing efficient architectures and fine-tuning pretrained models. Finally, for sampling from pretrained models, we deploy information-theoretic techniques to obtain high-quality samples. For designing efficient architectures, we provide two novel group equivariant architectures: Equivariant Mesh Attention Networks (EMAN) and Group Representation Networks (G-RepsNet). EMAN is an attention- based architecture for mesh data that is equivariant to various kinds of natural symmetries, including translations, rotations, scaling, node permutations, and gauge transformations. Our pipeline relies on the use of relative tangential features: a simple, effective, equivariance-friendly alternative to raw node positions as inputs. Experiments on standard mesh datasets confirm that our proposed architecture achieves improved performance on these benchmarks and is robust to a wide variety of local/global transformations. G-RepsNet is a scalable lightweight neural network equivariant to arbitrary matrix groups with features represented using tensor polynomials. Existing architectures equivariant to arbitrary matrix groups, in contrast, do not scale well beyond toy datasets. Further, G-RepsNet is also shown to be a universal approximator of functions equivariant to orthogonal groups. Experiments on synthetic datasets, image datasets, and fluid dynamics datasets illustrate its competitive performance to state-of-the-art equivariant models while being computationally inexpensive. For fine-tuning pretrained models, we present equituning, λ-equituning, and multi-equituning. Equituning is a novel finetuning method that transforms (potentially non-equivariant) pretrained models into group equivariant models while incurring a minimum L2 loss between the feature representations of the pretrained and the equivariant models. Large pretrained models can be equituned for different groups to satisfy the needs of various downstream tasks. In λ-equituning, we focus on optimizing the performance of equituning further, and in multi-equituning, we improve the computational efficiency of equituning for large product groups. We test these methods for image classification, compositional generalization in languages, and fairness in natural language generation. For sampling from pretrained large language models (LLMs), we present mirostat, which generates high-quality texts from LLMs by controlling the entropy content of generated texts. It is well known that previous sampling methods such as top-k, top-p, and temperature-based sampling methods often yield texts that have objectionable repetition or incoherence. We find that repetition in generated texts is correlated with low entropy content in them. On the other hand, incoherence in generated text is highly correlated with high entropy content in generated texts. Hence, we provide a sampling algorithm that dynamically controls the entropy content of the generated text. Thus, mirostat helps generate coherent text and avoids repetitions.
- Graduation Semester
- 2024-08
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/125608
- Copyright and License Information
- Copyright 2024 Sourya Basu
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…