A tiered network buffer architecture for fast networking in chiplet-based CPUs
Wang, Jerry
This item's files can only be accessed by the System Administrators group.
Permalink
https://hdl.handle.net/2142/129792
Description
Title
A tiered network buffer architecture for fast networking in chiplet-based CPUs
Author(s)
Wang, Jerry
Issue Date
2025-05-09
Director of Research (if dissertation) or Advisor (if thesis)
Kim, Nam Sung
Department of Study
Electrical & Computer Eng
Discipline
Electrical & Computer Engr
Degree Granting Institution
University of Illinois Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
Computer Architecture
Computer Network
Language
eng
Abstract
To manufacture a large CPU cost-effectively, the industry has begun exploiting emerging packaging technologies that integrate multiple chiplets—each comprising a subset of cores and/or memory and I/O subsystems—into a single package. However, such a CPU experiences longer memory access latency with more pronounced variance, especially when its cores in one chiplet access LLC slices1 or DRAM controllers in other chiplets. This creates unique challenges in µs-scale networking, which is highly sensitive to memory access latency. In this work, we start by proposing exploiting a little-known mode, known as Sub-NUMA Clustering (SNC), in the latest chiplet-based CPUs. As it restricts receiving and processing packets to a particular chiplet unless explicitly specified otherwise, it offers shorter memory access latency and, consequently, lower networking latency than the default mode (non-SNC). Nonetheless, when receiving long bursts of packets2, SNC incurs higher networking latency than non-SNC, as it provides less LLC capacity for CPU cores processing the packets, making Direct Cache Access (DCA)—a commonly used CPU feature to reduce memory access latency for packet processing—ineffective. To address this drawback, we propose TiNA, a tiered network buffer architecture consisting of an enhanced NIC and networking stack, which opportunistically uses LLC slices in other chiplets for DCA only when receiving long bursts of packets. On average, TiNA reduces the mean (tail) latency by 25% (18%) and 28% (22%), compared to SNC and non-SNC, respectively, across diverse network applications and traces.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.