Files in this item



application/pdfSAUPPE-DISSERTATION-2015.pdf (3MB)
(no description provided)PDF


Title:Balance Optimization Subset Selection: a framework for causal inference with observational data
Author(s):Sauppe, Jason James
Director of Research:Jacobson, Sheldon H.
Doctoral Committee Chair(s):Jacobson, Sheldon H.
Doctoral Committee Member(s):Holder, Allen; Chekuri, Chandra S.; Godrey, P. Brighten
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
causal inference
operations research
Abstract:Observational data are prevalent in many fields of research, and it is desirable to use this data to explore potential causal relationships. Additional assumptions and methods for post-processing the data are needed to construct unbiased estimators of causal effects because such data is non-random. This dissertation describes the Balance Optimization Subset Selection (BOSS) framework to apply causal inference to observational data. BOSS is designed to identify the subset of observational data that is most appropriate for computing causal estimates. To do this, it compares the available treatment units to potential sets of control units on a set of confounding factors, called covariates, with the goal of identifying a control group that minimizes a measure of covariate imbalance. Which imbalance measure to use with BOSS is an important consideration that depends both on the quality of the available observational data and on the assumptions that a researcher is willing to make. The standard assumption for observational data, known as strong ignorability, is extended in several ways to be directly applicable to BOSS. Under these additional assumptions, specific levels of covariate balance are both necessary and sufficient for the treatment effect estimate to be unbiased. There is a trade-off in that weaker assumptions require a higher level of covariate balance in order to guarantee estimator unbiasedness. These additional assumptions bridge the gap between existing parametric and non-parametric methods. Each imbalance measure for BOSS leads to an associated optimization problem. The computational complexity of these problems is discussed, and efficient algorithms are developed to handle several special cases. A constant factor approximation algorithm is also presented for one imbalance measure. Given the potential applications of BOSS, identifying optimal or near-optimal solutions for these problems is of great practical interest. Heuristics and exact algorithms are considered, and computational tests demonstrate their effectiveness at minimizing imbalance. Additional tests validate BOSS on a well-studied dataset from the literature and highlight the value of alternate optima as a way to corroborate the assumptions that are made.
Issue Date:2015-07-16
Rights Information:Copyright 2015 Jason James Sauppe
Date Available in IDEALS:2015-09-29
Date Deposited:August 201

This item appears in the following Collection(s)

Item Statistics