Files in this item



application/pdfJIANG-DISSERTATION-2015.pdf (6MB)
(no description provided)PDF


Title:Online advertisements and multi-armed bandits
Author(s):Jiang, Chong
Director of Research:Srikant, R.
Doctoral Committee Chair(s):Srikant, R.
Doctoral Committee Member(s):Beck, Carolyn; Nedich, Angelia; Veeravalli, Venugopal V.
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Multi-armed bandits
online advertisements
reinforcement learning
Abstract:We investigate a number of multi-armed bandit problems that model different aspects of online advertising, beginning with a survey of the key techniques that are commonly used to demonstrate the theoretical limitations and achievable results for the performance of multi-armed bandit algorithms. We then formulate variations of the basic stochastic multi-armed bandit problem, aimed at modeling how budget-limited advertisers should bid and how ad exchanges should choose whose ad to display, and study them using these techniques. We first consider online ad auctions from the point of view of a single advertiser who has an average budget constraint. By modeling the rest of the bidders through a probability distribution (often referred to as the mean-field approximation), we develop a simple bidding strategy which can be implemented without any statistical knowledge of bids, valuations, and query arrival processes. The key idea is to use stochastic approximation techniques to automatically track long-term averages. Next, we consider multi-armed bandits with budgets, modeling how ad exchanges select which ad to display. We provide asymptotic regret lower bounds satisfied by any algorithm, and propose algorithms which match those lower bounds. We consider different types of budgets: scenarios where the advertiser has a fixed budget over a time horizon, and scenarios where the amount of money that is available to spend is incremented in each time slot. Further, we consider two different pricing models, one in which an advertiser is charged each time their ad is shown, and one in which the advertiser is charged only if a user clicks on the ad. For all of these cases, we show that it is possible to achieve O(log(T)) regret. For both the cost-per-impression and cost-per-click models, with a fixed budget, we provide regret lower bounds that apply to any uniformly good algorithm. Further, we show that B-KL-UCB, a natural variant of KL-UCB, is asymptotically optimal for these cases. Numerical experiments (based on a real-world data set) further suggest that B-KL-UCB also has the same or better finite-time performance when compared to various previously proposed (UCB-like) algorithms. Finally, we consider the problem of multi-armed bandits with a large, possibly infinite number of correlated arms, modeling a retailer advertising a large number of related items. We assume that the arms have Bernoulli distributed rewards, where the probabilities of success are parametrized by known attribute vectors for each arm and an unknown vector which describes the preferences of the target audience. For this model, we seek an algorithm with a total regret that is sub-linear in time and independent of the number of arms. We present such an algorithm and analyze its performance, showing upper bounds on the total regret which apply uniformly in time, for both the finite and infinite arm cases.
Issue Date:2015-04-13
Rights Information:Copyright 2015 Chong Jiang
Date Available in IDEALS:2015-07-22
Date Deposited:May 2015

This item appears in the following Collection(s)

Item Statistics