Optimization and automation for efficient neural architecture design

Li, Yuhong

Optimization and automation for efficient neural architecture design

Li, Yuhong

Content Files

LI-DISSERTATION-2024.pdf

Permalink

https://hdl.handle.net/2142/125607

Description

Title

Optimization and automation for efficient neural architecture design

Author(s)

Li, Yuhong

Issue Date

2024-07-11

Director of Research (if dissertation) or Advisor (if thesis)

Chen, Deming

Doctoral Committee Chair(s)

Chen, Deming

Committee Member(s)

Kim, Nam Sung
Wang, Yuxiong
Xiong, Jinjun
Dey, Debadeepta
Hao, Cong

Department of Study

Electrical & Computer Eng

Discipline

Electrical & Computer Engr

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Deep Learning
Neural Networks
Neural Architecture Search
Computer Vision
Natural Language Processing

Abstract

As AI progresses, deep neural networks (DNNs) have become increasingly complex and resource-intensive, challenging their deployment on both cloud and edge systems. This dissertation addresses the critical challenges in neural architecture design by exploring various efficient search and optimization methods. It aims to deepen our understanding of the factors influencing DNN quality and develop automated processes to create optimized architectures for various applications, especially in resource-constrained settings. Structured into five chapters, this dissertation aims to address three central inquiries: • Efficient exploration within a defined design space: How do we navigate a predefined design space to not only explore efficiently but also identify and forge superior architectural designs? • Innovation beyond conventional boundaries: Machine-led design, guided by predefined algorithms and data, often lacks the creative novelty of researchers who utilize abstract thinking and diverse experiences. How can we not only break traditional confines but also refine and elevate innovative designs using automation? • Harmonizing hardware efficiency with quality: How do we strike a balance between the efficiency demands of the target hardware and the quality requirements of the model? Our journey starts with EDD, where we introduce a hardware-aware differentiable neural architecture search. This novel method employs a differentiable formulation that integrates both model and hardware variables into a single search space, allowing for the improvement of efficiency. The key innovations include enhanced predictions on massive classification datasets and better efficiency in energy and computation speed on the target hardware, leading to significant advancements in the quality of neural architectures searched. However, we notice that the latency of neural networks is easier to predict for well-defined search spaces compared to the quality of neural architectures. Also, search spaces are not always representable in a continuous form, such as varying downsampling rates across different convolutional layers in a CNN, which introduces discrete decision points. To address this challenge, we introduce GenNAS, a search algorithm that utilizes synthetic regression- based few-shot learning tasks to obtain estimated scores of networks that are highly correlated with their quality on large datasets. Furthermore, we observe that although GenNAS effectively navigates discrete search spaces and predicts accurately, evaluating each network individually remains time-consuming. We propose Eproxy, a sophisticated proxy- based approach that leverages self-supervised learning and few-shot techniques. By creating a more challenging synthetic proxy task, Eproxy enables faster evaluation and drastically reduces computational expenses. Moreover, we introduce a task search space that allows Eproxy to adapt to unconventional search spaces across various tasks. Although the proposed NAS algorithms can perform accurate predictions within enormous search spaces, these spaces usually consist of conventional components, such as standard convolutional layers, which limit the potential for innovative breakthroughs. Based on our observations that current models fail to efficiently handle long-range dependencies, we have designed a novel architectural component called Structured Global Convolution (SGConv). This component employs a multi-scale sub-kernel strategy, which significantly enhances the capture of long-range dependencies. We further explore the search space of the sparsity of SGConv and a mixed architecture with both SGConv and the attention as the components. By understanding the benefit of collaboration between human innovation and machine automation, we use the paradigms to explore the design of large language models (LLMs) using automation. Noticing that LLMs are primarily memory-bound due to the sequential nature of autoregressive decoding—where each token generated requires transferring large model parameters from memory, resulting in inefficient use of computational resources—we introduce Medusa. Medusa optimizes decoding efficiency by employing an adaptor that integrates additional decoding heads capable of predicting multiple subsequent tokens simultaneously, significantly enhancing throughput. We develop an automated framework that delivers the adapter from training to deployment. We further explore the search space such as the sparsity of the tree and the number of decoding heads. Moreover, we propose a simple hardware-aware profiling strategy for Medusa to predict its performance on the target devices with the consideration of batch sizes, sequence lengths, and sizes of LLMs. In conclusion, this dissertation addresses neural architecture design challenges through innovative optimization and automation methods, advancing AI. By exploring differentiable searches, synthetic evaluations, and innovations like SGConv and the Medusa framework, it enriches our understanding of neural network efficiency and sets innovative standards for AI applications. Each chapter contributes to deep learning design, enhancing applications from the mobile to the cloud and promoting sustainable AI evolution.

Graduation Semester

2024-08

Type of Resource

Thesis

Handle URL

https://hdl.handle.net/2142/125607

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Optimization and automation for efficient neural architecture design

Li, Yuhong

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Log In