Cutting the datacenter tax in heterogeneous systems: a coherence-spectrum approach
Ji, Houxiang
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/132656
Description
Title
Cutting the datacenter tax in heterogeneous systems: a coherence-spectrum approach
Author(s)
Ji, Houxiang
Issue Date
2025-11-24
Director of Research (if dissertation) or Advisor (if thesis)
Kim, Nam Sung
Doctoral Committee Chair(s)
Kim, Nam Sung
Committee Member(s)
Torrellas, Josep
Xu, Tianyin
Ghose, Saugata
Wang, Ren
Department of Study
Computer Science
Discipline
Computer Science
Degree Granting Institution
University of Illinois Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
Datacenter tax
Coherence
Heterogeneous system
Abstract
Modern datacenters pay two persistent overheads: a memory tax from memory management and optimization features such as memory deduplication and compressed swapping, and a network tax from end-host processing that surrounds the NIC rather than the packet transmission itself. This thesis reduces these taxes by relocating data-plane work to the most suitable devices in heterogeneous systems, using coherence as the guiding design axis. Our approach advances from non-coherent to fully coherent designs. First, we introduce Styx, a SmartNIC-based framework that splits memory optimization features into a control plane and a data plane. By offloading repetitive, CPU-intensive data-plane functions via RDMA to a non-coherent SmartNIC, Styx reduces host CPU cycle consumption and alleviates cache pollution, thereby lowering the memory tax in practice. Second, we investigate a coherent on-chip accelerator, Intel’s Data Streaming Accelerator (DSA), in the context of memory deduplication. A direct function offload (DSA-ksm) trims host CPU cycles but yields limited end-to-end memory savings due to per host-to-DSA offload overheads. To fully leverage DSA’s batching capability, we introduce Para-ksm, which restructures the deduplication workflow so that multiple data-plane functions are batched and executed on the DSA per offload, improving offload efficiency and restoring competitive deduplication rates. Finally, we examine Compute Express Link (CXL), an emerging coherent interconnect built on top of PCIe. We first provide a characterization of CXL Type-2 device and derive practical guidelines for its use. Applying these insights to the memory tax reduction, we reimplement two optimization features with coherent load/store semantics. Compared to Styx and Para-ksm, the design based on CXL devices requires fewer software changes while enabling more efficient offloading. For the network tax, we re-architect end-host networking using CXL.cache and CXL.mem protocols, removing expensive non-coherent PCIe operations in the host–NIC datapaths and exploiting on-NIC coherent memory for networking data placement. Taken together, these works advance along the coherence spectrum, from non-coherent SmartNICs to on-chip accelerators to CXL devices, delivering progressively larger reductions in datacenter memory and network tax and demonstrating the benefits of coherence at the system level.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.