Optimal graph learning

Xu, Zhe

Optimal graph learning

Xu, Zhe

Permalink

https://hdl.handle.net/2142/129837

Description

Title

Optimal graph learning

Author(s)

Xu, Zhe

Issue Date

2025-07-07

Director of Research (if dissertation) or Advisor (if thesis)

Tong, Hanghang

Doctoral Committee Chair(s)

Tong, Hanghang

Committee Member(s)

Banerjee, Arindam
Chen, Yuzhong
Han, Jiawei

Department of Study

Computer Science

Discipline

Computer Science

Degree Granting Institution

University of Illinois Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Graph Machine Learning
Graph Data Augmentation

Language

eng

Abstract

The past decades have seen significant advancements in graph machine learning, with numerous sophisticated models and algorithms crafted for a variety of learning tasks, including ranking, classification, regression, and anomaly detection. Generally, most existing works focus on addressing the question: given a graph, what is the best way to mine it? Despite their remarkable achievements, little attention is paid to the graph data itself, which could be noisy, huge, and imbalanced at every stage of the data collection process. In this thesis, our focus is on the relatively unexplored realm of graph data, intending to enhance various downstream graph machine learning tasks. We term this line of research "optimal graph learning", aiming to identify the most effective graph data to improve efficiency, effectiveness, and expressiveness. However, some unique challenges arise. First (formulation), it is not clear how to formulate data optimization in a data-driven way, especially considering that the downstream tasks can be versatile. Second (volume), the sheer volume of graph datasets can result in significant time and space complexity for underlying optimization solutions. Third (pattern), capturing various essential graph patterns at different granularities presents a challenge. This thesis introduces our progress towards the optimal graph learning problem. Concretely, we categorize our work into three directions: graph refinement, graph augmentation, and graph distillation. For graph refinement, we developed (1) a pure data-driven solution named GaSoliNe against noisy data and (2) Stager, a solution tailored for addressing imbalanced data. For graph augmentation, we developed three augmentation solutions: (1) ALT, enhancing broad models' performance on graphs with arbitrary heterophily, (2) DisCo, which can generate realistic graphs based on the training graphs, and (3) AuGLM, which incorporates the graph structure into the textual input so that the language models can successfully handle the node classification task. For graph distillation, we developed (1) a bilevel optimization-based solution named KiDD to shrink the size of given graphs and, meanwhile, preserve the utility of training data and (2) graph rationale discovery framework named FIG, which can find the critical subgraph in every given graph to enhance the performance of graph-level performance. Collectively, these contributions establish foundational progress toward data-centric graph machine learning and demonstrate the value of optimizing graph data itself to improve downstream task performance.

Graduation Semester

2025-08

Type of Resource

Text

Handle URL

https://hdl.handle.net/2142/129837

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Computer Science

Dissertations and Theses from the Siebel School of Computer Science

Optimal graph learning

Xu, Zhe

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

Log In