## Files in this item

FilesDescriptionFormat

application/pdf

WANG-THESIS-2019.pdf (1MB)
(no description provided)PDF

## Description

 Title: On upper confidence bound algorithms for piecewise-stationary stochastic multi-armed bandits and the variants Author(s): Wang, Lingda Advisor(s): Zhao, Zhizhen Department / Program: Electrical & Computer Eng Discipline: Electrical & Computer Engr Degree Granting Institution: University of Illinois at Urbana-Champaign Degree: M.S. Genre: Thesis Subject(s): Multi-Armed Bandits Non-stationary Environments Change-Point Detection Abstract: In recent years, multi-armed bandit (MAB) problems have received much attention, as they model many real-world applications such as online recommendation, web search and crowdsourcing tasks. The core of MAB algorithms is addressing the exploration versus exploitation dilemma, and finding the right balance between them. Several simple yet effective algorithms (e.g., upper confidence bound (UCB) and Thompson sampling (TS)) have been proposed in the literature, which are order optimal compared with the lower bound. Original MAB problems are considered in a stationary environment, where the reward distributions do not evolve over time. Many real-world applications, however, have a non-stationary nature that cannot be fully characterized by the stationary settings. In this thesis, we mainly study the UCB-based algorithms for MAB problems and the variants under the scenario where the reward distributions can change in a piecewise-stationary manner. The variants considered in this thesis are combinatorial MAB (CMAB) and cascading bandit (CB) problems. In the first part of this thesis (Chapters 2, 3 and 4), we propose algorithms, \texttt{GLR-UCB}, \texttt{GLR-CUCB} and \texttt{GLR-CascadeUCB1}, for piecewise-stationary MAB, CMAB and CB problems, respectively. The key idea behind the proposed algorithms is incorporating an almost parameter-free change-point detector, the generalized likelihood ratio (GLR) change-point detector, within the classical \texttt{UCB1} algorithm and its variants (e.g., \texttt{CUCB} and \texttt{CascadeUCB1}). Gap-dependent regret upper bounds of the proposed algorithms are derived and all on the order of $\mathcal{O}(\sqrt{NLT\log{T}})$, where $N$ is the number of piecewise-stationary segments, $L$ is the number of arms in MAB, base arms in CMAB, or items in CB. We also present numerical experiments on both synthetic and real-world datasets to show that our proposed algorithms outperform other state-of-the-art algorithms in the literature. Next, in the second part (Chapter 5), we also derive a nearly matching regret lower bound on the order of $\Omega(\sqrt{NLT})$ for MAB problems, which improves the current best lower bound $\Omega(\sqrt{T})$ by adding the dependence on $L$ and $N$. Since CMAB and CB are variants of MAB, this lower bound also holds for CMAB and CB. Issue Date: 2019-11-12 Type: Text URI: http://hdl.handle.net/2142/106342 Rights Information: Copyright 2019 Lingda Wang Date Available in IDEALS: 2020-03-02 Date Deposited: 2019-12
﻿