This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Director of Research (if dissertation) or Advisor (if thesis)
Hwu, Wen-mei
Doctoral Committee Chair(s)
Hwu, Wen-mei
Committee Member(s)
Patel, Sanjay
Chen, Deming
Lumetta, Steven S
Department of Study
Electrical & Computer Eng
Discipline
Electrical & Computer Engr
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
GPU
parallel computing
GNN
storage
large-scale ML
distributed system
Abstract
Graph Neural Networks (GNNs) are widely used today in applications such as recommendation systems, fraud detection, and node/link classification tasks. As real-world graphs and embeddings used for GNN training continue to grow in scale, their memory footprint often exceeds the memory capacities of GPUs, creating a bottleneck in efficient training. Traditional GNN training frameworks address limited memory by either storing feature data in external storage and fetching it on-demand or sharding the graph across multiple GPUs and transferring data as needed. However, the first approach suffers from high storage latency, while the second is burdened by the high computational costs of graph partitioning, excessive inter-GPU communication, and increased total cost of ownership.
To address these challenges, this dissertation introduces three storage-based GNN frameworks—GIDS, LSM-GNN, and SSD-GNN—that span single-GPU, multi-GPU, and multi-node environments, respectively. GIDS accelerates single-GPU GNN training by leveraging GPU thread parallelism to hide storage latency. LSM-GNN extends this to a multi-GPU setting by implementing a system-wide shared cache using NVLink, optimizing memory bandwidth and cache hit rates without graph partitioning. SSD-GNN further scales GNN training across multiple nodes by integrating GPU-initiated direct storage access and a distributed caching protocol to reduce data movement overhead across nodes.
These frameworks effectively utilize heterogeneous hardware resources such as SSDs, CPU memory, GPU parallelism, and GPU PCIe bandwidth to optimize data transfer operations and hide storage latency. Prototypes of these systems demonstrate significant improvements in GNN training performance and reduced total cost of ownership compared to state-of-the-art systems, marking a new pathway for scalable, storage-efficient GNN training across a range of compute environments.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.