HASICs: Investigating hyperspecialized ASICs for neural network inference
Chakraborty, Srijan
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/125737
Description
Title
HASICs: Investigating hyperspecialized ASICs for neural network inference
Author(s)
Chakraborty, Srijan
Issue Date
2024-07-18
Director of Research (if dissertation) or Advisor (if thesis)
Kumar, Rakesh
Department of Study
Electrical & Computer Eng
Discipline
Electrical & Computer Engr
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
ASICs
Architecture
RTL
Accelerator
Neural Network
Inference
Digital Circuit
Abstract
The current state-of-the-art accelerators for neural network inference tend to be model-agnostic and are thus over-provisioned for specific networks. A large range of emerging embedded applications today only need to run a couple of networks, where changing the model (or for especially deeply embedded applications even changing the weights) after deployment is not necessary. These applications often have very strict latency, area, power, and energy usage requirements, and the corresponding neural networks for these applications are very small; thus, existing state-of-the-art accelerators are often too expensive and waste far too much energy and time on data movement, and over-provisioning also leads to far too much wasted area allocation. As a solution, we look at hyperspecializing ASICs to specific models as a strategy for maximizing energy efficiency and minimizing latency in accelerator design while keeping our area reasonable for such neural network accelerators. We automate a lot of the methodology of designing such chips from a given model as well, cutting down on NRE costs. The goal is to enable cheap, energy-efficient acceleration for NNs in such embedded cases. In this work, we evaluate designing such hyperspecialized ASICs for neural network inference, with model-specific designs translating structures of known NNs directly into hardware, and data-specific designs which further optimize these designs by leveraging knowledge of preset weights. We also evaluate further design techniques on these ASICs, such as rolling to decrease area, pipelining to increase throughput, and merging to save area and energy usage through hardware reuse when building an ASIC for multiple networks. For a suite of small neural networks, we analyze the area, latency, power, and energy usage of each of the resulting ASICs from this methodology. We also compare our designs against the extremely low-energy and low-power embedded Arm Cortex M33 core running these networks; we find that with a combination of our design techniques we can on average offer above a 99% reduction in latency and energy usage while reaching on average within less than 4x the Cortex M33 area for model-specific designs and less than 2x the area for data-specific designs. To the best of our knowledge, this is the first work that evaluates hyperspecialization to such a degree in neural network accelerators; the aim is to enable embedded applications by offering cheaper and more efficient tailored designs than the existent state-of-the-art accelerators for these applications.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.