Files in this item



application/pdfMAGESH-THESIS-2020.pdf (351kB)
(no description provided)PDF


Title:Decentralized multi-user multi-armed bandits with user dependent reward distributions
Author(s):Magesh, Akshayaa
Advisor(s):Veeravalli, Venugopal V.
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):multiarmed bandits, multi-player, spectrum access, decentralized
Abstract:The uncoordinated spectrum access problem is studied using a multi-player multi-armed bandits framework. We consider a decentralized multi-player stochastic multi-armed bandit model where the players cannot communicate with each other and can observe only their own actions and rewards. Furthermore, the environment may appear differently to different players, i.e., the reward distributions for a given arm may vary across players. Knowledge of time horizon T is not assumed. Under these conditions, we consider two settings - zero and non-zero reward on collision (when more than one player plays the same arm). Under the zero reward on collision setting, we present a policy that achieves expected regret of O(log T) over a time horizon of duration T. While settings with non-zero rewards on collisions and varying reward distributions of arms across players have been considered separately in prior work, a model allowing for both has not been studied previously to the best of our knowledge. With this setup, we present a policy that achieves expected regret of order O(log^{2 + \delta} T) for some 0 < \delta < 1 over a time horizon of duration T.
Issue Date:2020-04-28
Rights Information:Copyright 2020 Akshayaa Magesh
Date Available in IDEALS:2020-08-26
Date Deposited:2020-05

This item appears in the following Collection(s)

Item Statistics