Heterogeneous machine learning with decentralized data

Bao, Wenxuan

Heterogeneous machine learning with decentralized data

Bao, Wenxuan

Permalink

https://hdl.handle.net/2142/132484

Description

Title

Heterogeneous machine learning with decentralized data

Author(s)

Bao, Wenxuan

Issue Date

2025-11-12

Director of Research (if dissertation) or Advisor (if thesis)

He, Jingrui

Doctoral Committee Chair(s)

He, Jingrui

Committee Member(s)

Zhang, Tong
Zhao, Han
Li, Pan

Department of Study

Siebel School Comp & Data Sci

Discipline

Computer Science

Degree Granting Institution

University of Illinois Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Distribution shift
Federated learning
Test-time adaptation

Language

eng

Abstract

This thesis explores heterogeneous machine learning with decentralized data, where multiple clients with distinct data distributions jointly train or adapt machine learning models under the coordination of a central server. Throughout the process, clients’ private data never leave their local devices. This paradigm underlies numerous real-world applications, such as collaborative training of financial fraud detection models among banks, or collective health monitoring enabled by massive wearable devices. We investigate three fundamental challenges in this setting. (P1) Effective model training: How can we train models from multiple source clients with heterogeneous labeled data so that models perform well across all clients? (P2) Adaptive model deployment: How can we adapt the trained model to each target client without labels, allowing it to adjust to its own data distribution for improved performance? (P3) Robust system design: How can we ensure robustness during both training and adaptation, preventing performance degradation caused by random failures or malicious attacks? To address (P1), we develop client clustering algorithms that enable knowledge transfer among clients with similar data distributions, allowing those with limited data to benefit from collaboration. For (P2), we first design task-specific adaptation algorithms that identify and mitigate distribution shifts across different modalities under one-to-one adaptation. We further extend to multi-client collaboration, where the model learns patterns of distribution shifts across clients to enable collaborative test-time adaptation. Finally, for (P3), we propose robust training algorithms resilient to abnormal or adversarial clients, and robust adaptation algorithms that mitigate model prediction bias and out-of-distribution data effects. Together, these contributions form an effective and robust framework for heterogeneous machine learning under decentralized data environments.

Graduation Semester

2025-12

Type of Resource

Thesis

Handle URL

https://hdl.handle.net/2142/132484

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Computer Science

Dissertations and Theses from the Siebel School of Computer Science

Heterogeneous machine learning with decentralized data

Bao, Wenxuan

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

Log In