Group theory and information theory algorithms in deep learning

Basu, Sourya

Group theory and information theory algorithms in deep learning

Basu, Sourya

Permalink

https://hdl.handle.net/2142/125608

Description

Title

Group theory and information theory algorithms in deep learning

Author(s)

Basu, Sourya

Issue Date

2024-07-11

Director of Research (if dissertation) or Advisor (if thesis)

Varshney, Lav R

Doctoral Committee Chair(s)

Varshney, Lav R

Committee Member(s)

Milenkovic, Olgica
Hasegawa-Johnson, Mark
Cohen, Taco
Lohit, Suhas

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Deep learning
Group equivariance
Geometric deep learning
Foundation models, Large language models
Text decoding

Abstract

We are witnessing an enormous growth in deep learning that impacts our day-to-day lives with applications ranging from drug discovery to coding assistants. Accelerating this growth requires optimizing the several stages of building deep learning models such as designing efficient architectures, fine-tuning pretrained models, and sampling data from these models. We use group theory for designing efficient architectures and fine-tuning pretrained models. Finally, for sampling from pretrained models, we deploy information-theoretic techniques to obtain high-quality samples. For designing efficient architectures, we provide two novel group equivariant architectures: Equivariant Mesh Attention Networks (EMAN) and Group Representation Networks (G-RepsNet). EMAN is an attention- based architecture for mesh data that is equivariant to various kinds of natural symmetries, including translations, rotations, scaling, node permutations, and gauge transformations. Our pipeline relies on the use of relative tangential features: a simple, effective, equivariance-friendly alternative to raw node positions as inputs. Experiments on standard mesh datasets confirm that our proposed architecture achieves improved performance on these benchmarks and is robust to a wide variety of local/global transformations. G-RepsNet is a scalable lightweight neural network equivariant to arbitrary matrix groups with features represented using tensor polynomials. Existing architectures equivariant to arbitrary matrix groups, in contrast, do not scale well beyond toy datasets. Further, G-RepsNet is also shown to be a universal approximator of functions equivariant to orthogonal groups. Experiments on synthetic datasets, image datasets, and fluid dynamics datasets illustrate its competitive performance to state-of-the-art equivariant models while being computationally inexpensive. For fine-tuning pretrained models, we present equituning, λ-equituning, and multi-equituning. Equituning is a novel finetuning method that transforms (potentially non-equivariant) pretrained models into group equivariant models while incurring a minimum L2 loss between the feature representations of the pretrained and the equivariant models. Large pretrained models can be equituned for different groups to satisfy the needs of various downstream tasks. In λ-equituning, we focus on optimizing the performance of equituning further, and in multi-equituning, we improve the computational efficiency of equituning for large product groups. We test these methods for image classification, compositional generalization in languages, and fairness in natural language generation. For sampling from pretrained large language models (LLMs), we present mirostat, which generates high-quality texts from LLMs by controlling the entropy content of generated texts. It is well known that previous sampling methods such as top-k, top-p, and temperature-based sampling methods often yield texts that have objectionable repetition or incoherence. We find that repetition in generated texts is correlated with low entropy content in them. On the other hand, incoherence in generated text is highly correlated with high entropy content in generated texts. Hence, we provide a sampling algorithm that dynamically controls the entropy content of the generated text. Thus, mirostat helps generate coherent text and avoids repetitions.

Graduation Semester

2024-08

Type of Resource

Thesis

Handle URL

https://hdl.handle.net/2142/125608

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Group theory and information theory algorithms in deep learning

Basu, Sourya

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In