Invariant learning for learning in the wild

Wang, Xiaoyang

Invariant learning for learning in the wild

Wang, Xiaoyang

This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.

Permalink

https://hdl.handle.net/2142/127396

Description

Title

Invariant learning for learning in the wild

Author(s)

Wang, Xiaoyang

Issue Date

2024-12-04

Director of Research (if dissertation) or Advisor (if thesis)

Koyejo, Oluwasanmi

Doctoral Committee Chair(s)

Nahrstedt, Klara

Committee Member(s)

Tong, Hanghang
Dimitriadis, Dimitrios

Department of Study

Siebel School Comp & Data Sci

Discipline

Computer Science

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Machine Learning
Invariance
Robustness

Language

eng

Abstract

Machine learning models are increasingly deployed in production (i.e., the wild) but may fail for various reasons. For example, fraud detection models can protect numerous users against phishing emails but are subject to intentional poisoning and may fail to identify novel types of phishing. Similarly, a language model provides timely answers to user questions. However, the answer quality can decrease significantly or be harmful even if minor changes apply to the questions. Common failures of machine learning models in production environments fall into two categories: (1) data quality and (2) data shift. Data quality problems can be caused by malicious adversaries that aim to corrupt machine learning models, uncurated crowdsourced data from the web, etc. Meanwhile, data shift problems often occur due to the mismatch between the offline training data and the continuously evolving data in online production environments. Tackling the data quality and shift problems requires methods that help machine learning models continuously learn generally useful patterns from the data without entangling the harmful ones. In this dissertation, we introduce invariant learning as a paradigm to meet the aforementioned requirement and address the data quality and shift problems in the wild. In particular, we first study a data quality problem with multiple data sources with mixed data qualities. Our main contribution to this problem is a novel algorithm that helps machine learning models learn invariant patterns from multiple data sources and selectively filter out the contribution of low-quality data. Then, we further study a setting that requires machine learning models to be fine-tuned (i.e., customized) to a particular data source with improved performance but does not sacrifice the invariance benefit. The last part of this dissertation applies invariant learning to an active fine-tuning problem, which requires machine learning models to continuously learn new data with improved data efficiency. Our invariance-aware approach selects subsets of data samples that invariantly benefit the full dataset with minimal neglect of unselected data samples and helps machine learning models adapt to shifting data more effectively.

Graduation Semester

2024-12

Type of Resource

Thesis

Handle URL

https://hdl.handle.net/2142/127396

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Computer Science

Dissertations and Theses from the Siebel School of Computer Science

Invariant learning for learning in the wild

Wang, Xiaoyang

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

Log In