Subpopulation selection and debiased estimation for causal inference and predictive model evaluation
Liu, Yuxuan
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/127347
Description
Title
Subpopulation selection and debiased estimation for causal inference and predictive model evaluation
Bias in statistical and machine learning models refers to systematic errors that can skew results, leading to inaccurate conclusions and predictions. It can arise from various sources, including data collection methods, model selection, and underlying assumptions. Debiasing techniques aim to mitigate these errors and improve the reliability and validity of models. The goal of debiasing is to enhance model performance by reducing systematic errors, thus providing more accurate and trustworthy results for decision-making and inference. Common strategies for debiasing include cross-validation and bootstrap. Other methods involve correcting for biases introduced by extreme values, imbalanced datasets, and confounding variables. In this thesis, two chapters focus on advancing estimator debias methodologies in causal inference and classification, particularly in the realms of average treatment effect estimation and classification model evaluation.
The first chapter introduces a method for interpretable weighted average treatment effect (WATE) estimation under possible violation of overlapping assumption. By selecting subpopula- tions through optimizing a loss function via integer linear programming, the approach enhances the precision and reliability of treatment effect estimates, especially in the presence of extreme propensity scores. Simulation studies show that the proposed WATE estimators outperform classic IPW estimators in terms of bias reduction by a significant magnitude.
In the second chapter, classification model evaluation is examined, particularly being focused on cross-validation and bootstrap methods. Previously, our case study demonstrates the enhanced prediction of preterm birth that incorporates two nested cross-validated machine learning classifi- cation models. As a beginning, this chapter delves into the theoretical aspects of cross-validation for linear regression, confirming that sample mean square error is an unbiased consistent estimator of true MSE. After so, this chapter explores the AUC metric for evaluating various classifiers performances and addresses the limitations of the DeLong test, proposing alternatives like bootstrap resampling. Finally, this chapter highlights the issue of bias in nested model AUC difference estimation and demonstrates that sample splitting involvement provides unbiased, asymptotically normal estimators for cross-validated generalized linear model (GLM) and linear discriminant analysis (LDA).
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.