Supervisory control with online learning for stabilization and near-optimal performance of time-varying linear systems
Roy, Dhritiman
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/129620
Description
Title
Supervisory control with online learning for stabilization and near-optimal performance of time-varying linear systems
Author(s)
Roy, Dhritiman
Issue Date
2025-05-08
Director of Research (if dissertation) or Advisor (if thesis)
Li, Yingying
Department of Study
Industrial&Enterprise Sys Eng
Discipline
Industrial Engineering
Degree Granting Institution
University of Illinois Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
Online learning
Multi-Armed Bandit
Adaptive control
Pontryagin’s Maximum Principle
Abstract
Model-based control methods are widely used in robotics because they use system equations to compute efficient control actions. However, these methods often struggle in real-world situations where the system model is not perfect or where there are unexpected disturbances. In addition, solving nonlinear optimization problems in real time can be too slow or too demanding for systems with limited onboard computing power. To address these challenges, this study proposes a hybrid control approach that combines classical control, optimal planning and online learning. The system we focus on is a 2D quadrotor, modeled as a six-dimensional system controlled using force and torque inputs. At the lower level, we use three different types of controllers: a basic Proportional-Derivative (PD) controller, a trajectory planner using nonlinear programming (NLP), and a control law based on Pontryagin’s Maximum Principle (PMP), which we implement using PyTorch. At the higher level, we add a Multi-Armed Bandit (MAB) layer using the EXP3 algorithm. This layer learns over time which controller performs best based on feedback like tracking error and energy usage. It allows the system to switch between controllers depending on how well they are working at each moment.
Our results show that this combination of planning and learning can make the system more reliable and adaptive, even in uncertain environments. While we apply this to a quadrotor, the same idea can be used for many other types of robotic systems.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.