Files in this item

FilesDescriptionFormat

application/pdf

application/pdfWANG-THESIS-2019.pdf (1MB)Restricted to U of Illinois
(no description provided)PDF

Description

Title:On upper confidence bound algorithms for piecewise-stationary stochastic multi-armed bandits and the variants
Author(s):Wang, Lingda
Advisor(s):Zhao, Zhizhen
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:M.S.
Genre:Thesis
Subject(s):Multi-Armed Bandits
Non-stationary Environments
Change-Point Detection
Abstract:In recent years, multi-armed bandit (MAB) problems have received much attention, as they model many real-world applications such as online recommendation, web search and crowdsourcing tasks. The core of MAB algorithms is addressing the exploration versus exploitation dilemma, and finding the right balance between them. Several simple yet effective algorithms (e.g., upper confidence bound (UCB) and Thompson sampling (TS)) have been proposed in the literature, which are order optimal compared with the lower bound. Original MAB problems are considered in a stationary environment, where the reward distributions do not evolve over time. Many real-world applications, however, have a non-stationary nature that cannot be fully characterized by the stationary settings. In this thesis, we mainly study the UCB-based algorithms for MAB problems and the variants under the scenario where the reward distributions can change in a piecewise-stationary manner. The variants considered in this thesis are combinatorial MAB (CMAB) and cascading bandit (CB) problems. In the first part of this thesis (Chapters 2, 3 and 4), we propose algorithms, \texttt{GLR-UCB}, \texttt{GLR-CUCB} and \texttt{GLR-CascadeUCB1}, for piecewise-stationary MAB, CMAB and CB problems, respectively. The key idea behind the proposed algorithms is incorporating an almost parameter-free change-point detector, the generalized likelihood ratio (GLR) change-point detector, within the classical \texttt{UCB1} algorithm and its variants (e.g., \texttt{CUCB} and \texttt{CascadeUCB1}). Gap-dependent regret upper bounds of the proposed algorithms are derived and all on the order of $\mathcal{O}(\sqrt{NLT\log{T}})$, where $N$ is the number of piecewise-stationary segments, $L$ is the number of arms in MAB, base arms in CMAB, or items in CB. We also present numerical experiments on both synthetic and real-world datasets to show that our proposed algorithms outperform other state-of-the-art algorithms in the literature. Next, in the second part (Chapter 5), we also derive a nearly matching regret lower bound on the order of $\Omega(\sqrt{NLT})$ for MAB problems, which improves the current best lower bound $\Omega(\sqrt{T})$ by adding the dependence on $L$ and $N$. Since CMAB and CB are variants of MAB, this lower bound also holds for CMAB and CB.
Issue Date:2019-11-12
Type:Text
URI:http://hdl.handle.net/2142/106342
Rights Information:Copyright 2019 Lingda Wang
Date Available in IDEALS:2020-03-02
Date Deposited:2019-12


This item appears in the following Collection(s)

Item Statistics