Supervisory control with online learning for stabilization and near-optimal performance of time-varying linear systems

Roy, Dhritiman

Supervisory control with online learning for stabilization and near-optimal performance of time-varying linear systems

Roy, Dhritiman

This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.

Permalink

https://hdl.handle.net/2142/129620

Description

Title

Supervisory control with online learning for stabilization and near-optimal performance of time-varying linear systems

Author(s)

Roy, Dhritiman

Issue Date

2025-05-08

Director of Research (if dissertation) or Advisor (if thesis)

Li, Yingying

Department of Study

Industrial&Enterprise Sys Eng

Discipline

Industrial Engineering

Degree Granting Institution

University of Illinois Urbana-Champaign

Degree Name

M.S.

Degree Level

Thesis

Keyword(s)

Online Learning
Multi-armed Bandit
Adaptive Control
Pontryagin’s Maximum Principle

Language

eng

Abstract

Model-based control methods are widely used in robotics because they use system equations to compute efficient control actions. However, these methods often struggle in real-world situations where the system model is not perfect or where there are unexpected disturbances. In addition, solving nonlinear optimization problems in real time can be too slow or too demanding for systems with limited onboard computing power. To address these challenges, this study proposes a hybrid control approach that combines classical control, optimal planning and online learning. The system we focus on is a 2D quadrotor, modeled as a six-dimensional system controlled using force and torque inputs. At the lower level, we use three different types of controllers: a basic Proportional-Derivative (PD) controller, a trajectory planner using nonlinear programming (NLP), and a control law based on Pontryagin’s Maximum Principle (PMP), which we implement using PyTorch. At the higher level, we add a Multi-Armed Bandit (MAB) layer using the EXP3 algorithm. This layer learns over time which controller performs best based on feedback like tracking error and energy usage. It allows the system to switch between controllers depending on how well they are working at each moment. Our results show that this combination of planning and learning can make the system more reliable and adaptive, even in uncertain environments. While we apply this to a quadrotor, the same idea can be used for many other types of robotic systems.

Graduation Semester

2025-05

Type of Resource

Text

Handle URL

https://hdl.handle.net/2142/129620

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Supervisory control with online learning for stabilization and near-optimal performance of time-varying linear systems

Roy, Dhritiman

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Industrial and Enterprise Systems Engineering

Log In