Files in this item

FilesDescriptionFormat

application/pdf

application/pdfAGRAWAL-THESIS-2021.pdf (427kB)
(no description provided)PDF

Description

Title:Improved worst-case regret bounds for randomized least-squares value iteration
Author(s):Agrawal, Priyank
Advisor(s):Jiang, Nan
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:M.S.
Genre:Thesis
Subject(s):Reinforcement Learning
Exploration-Exploitation
Abstract:This work studies regret minimization with randomized value functions in reinforcement learning. In tabular finite-horizon Markov Decision Processes, we introduce a clipping variant of one classical Thompson Sampling (TS)-like algorithm, randomized least-squares value iteration (RLSVI). Our $\tilde{\mathrm{O}}(H^2S\sqrt{AT})$ high-probability worst-case regret bound improves the previous sharpest worst-case regret bounds for RLSVI and matches the existing state-of-the-art worst-case TS-based regret bounds.
Issue Date:2021-07-15
Type:Thesis
URI:http://hdl.handle.net/2142/113048
Rights Information:Copyright 2021 Priyank Agrawal
Date Available in IDEALS:2022-01-12
Date Deposited:2021-08


This item appears in the following Collection(s)

Item Statistics