Files in this item



application/pdfKHATIBI-THESIS-2016.pdf (252kB)
(no description provided)PDF


Title:Modeling the winning seed distribution of the NCAA basketball tournament
Author(s):Khatibi, Arash
Advisor(s):Jacobson, Sheldon H
Contributor(s):Jacobson, Sheldon H
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Bracket Challenge
NCAA Tournament
Abstract:The National Collegiate Athletic Association's (NCAA) men's division I college basketball tournament is an annual competition that draws widespread attention in the United States. Estimating the outcome of each game is a popular activity undertaken by numerous websites, fans, and more recently, academic researchers. There has been a surge of interest in proposing mathematical methods to model the tournament's results and pick the winners of future games. This thesis analyzes the results of the NCAA basketball tournament since 1985 and proposes several models to capture the winning seed distribution in each round. The Exponential Model estimates the winning probability of each team by modeling the time between a team's successive winnings in a round as an exponential random variable. The Exponential Model estimates a zero probability for events that have not occurred in the training data set. The Markov Model solves this limitation by defining a Markov chain that incorporates each team's winnings in prior rounds to estimate its winning probability. Results of these two models are validated using a chi-squared goodness of fit test. The Power Model, which is an intelligent tool for generating brackets of winners, quantifies the relative strength of each match-up in a round as a power function of the teams' seed numbers, with the exponent estimated using the historical results. The main problem of the Power Model is the data complications that are generally caused by the small size of the training data set, especially in later rounds. The Position and Upset Models solve this problem by representing the tournament's games as a binary sequence and estimating the outcome of each game based on the teams' performance in the similar game. While generating a bracket in a forward direction from the first to the last round propagates the incorrect picks through the tournament, correctly picking the winners in later rounds automatically fills the bracket for several games in earlier rounds. This motivates developing bidirectional models that pick the winners based on a combination of models in forward and backward directions. The Power, Position, Upset, and bidirectional models are assessed based on the aggregate performance of millions of brackets for the five most recent tournaments (2012-2016). The proposed models allow one to estimate the likelihoods of different seed combinations by applying the estimated winning seed distributions, which accurately summarize the seeds' aggregate performance and provide a deeper understanding of the uncertainty in the games' outcomes.
Issue Date:2016-11-23
Rights Information:Copyright 2016 Arash Khatibi
Date Available in IDEALS:2017-03-01
Date Deposited:2016-12

This item appears in the following Collection(s)

Item Statistics