This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/127396
Description
Title
Invariant learning for learning in the wild
Author(s)
Wang, Xiaoyang
Issue Date
2024-12-04
Director of Research (if dissertation) or Advisor (if thesis)
Koyejo, Oluwasanmi
Doctoral Committee Chair(s)
Nahrstedt, Klara
Committee Member(s)
Tong, Hanghang
Dimitriadis, Dimitrios
Department of Study
Siebel School Comp & Data Sci
Discipline
Computer Science
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
machine learning
invariance
robustness
Abstract
Machine learning models are increasingly deployed in production (i.e., the wild) but may fail for various reasons. For example, fraud detection models can protect numerous users against phishing emails but are subject to intentional poisoning and may fail to identify novel types of phishing. Similarly, a language model provides timely answers to user questions. However, the answer quality can decrease significantly or be harmful even if minor changes apply to the questions. Common failures of machine learning models in production environments fall into two categories: (1) data quality and (2) data shift. Data quality problems can be caused by malicious adversaries that aim to corrupt machine learning models, uncurated crowdsourced data from the web, etc. Meanwhile, data shift problems often occur due to the mismatch between the offline training data and the continuously evolving data in online production environments. Tackling the data quality and shift problems requires methods that help machine learning models continuously learn generally useful patterns from the data without entangling the harmful ones. In this dissertation, we introduce invariant learning as a paradigm to meet the aforementioned requirement and address the data quality and shift problems in the wild. In particular, we first study a data quality problem with multiple data sources with mixed data qualities. Our main contribution to this problem is a novel algorithm that helps machine learning models learn invariant patterns from multiple data sources and selectively filter out the contribution of low-quality data. Then, we further study a setting that requires machine learning models to be fine-tuned (i.e., customized) to a particular data source with improved performance but does not sacrifice the invariance benefit. The last part of this dissertation applies invariant learning to an active fine-tuning problem, which requires machine learning models to continuously learn new data with improved data efficiency. Our invariance-aware approach selects subsets of data samples that invariantly benefit the full dataset with minimal neglect of unselected data samples and helps machine learning models adapt to shifting data more effectively.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.