Passivity, no-regret, and performance in online learning and games

Abdelraouf, Hassan

Passivity, no-regret, and performance in online learning and games

Abdelraouf, Hassan

This item's files can only be accessed by the System Administrators group.

Permalink

https://hdl.handle.net/2142/132762

Description

Title

Passivity, no-regret, and performance in online learning and games

Author(s)

Abdelraouf, Hassan

Issue Date

2025-11-26

Director of Research (if dissertation) or Advisor (if thesis)

Shamma, Jeff

Doctoral Committee Chair(s)

Langbort, Cedric

Committee Member(s)

Dullerud, Geir
Tsukamoto, Hiroyasu

Department of Study

Aerospace Engineering

Discipline

Aerospace Engineering

Degree Granting Institution

University of Illinois Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Passivity No-regret Online Learning

Language

eng

Abstract

As autonomous AI agents become more widely deployed across dynamic, multi-agent environments, they will continuously learn and interact in real time to achieve complex goals. This thesis develops a control- and game-theoretic foundation to analyze and ultimately synthesize such systems in which adaptive agents evolve in the presence of other adaptive agents. Building on this motivation, the thesis investigates the interplay between passivity, no-regret and performance of continuous-time learning dynamics. The analysis is divided into two parts: (i) the interaction between a learning model and a dynamic, uncertain environment, and (ii) the interaction among multiple adaptive learners within a game. In the first part, the learning dynamic model is viewed as an input–output operator that maps the payoffs to strategies. Building on prior work for replicator dynamics, we show that if the learning dynamic model satisfies a passivity condition between the payoff vector and the deviation of its evolving strategy from any fixed strategy, it achieves finite regret. We then prove that this passivity condition holds for strategic higher-order variants of learning dynamics that have finite regret. We further provide numerical examples to illustrate the lack of finite regret of different evolutionary dynamic models that violate the passivity property. We also examine the fragility of the finite regret property under payoff perturbations. This raises an important question: is finite regret, by itself, a sufficient metric to assess the quality of the learning dynamic models , or should additional performance measures be considered? Motivated by this consideration, the thesis addresses the ``free-lunch'' question in no-regret learning- whether one no-regret algorithm outperform another in asymptotic average reward- so that an agent incurs regret for not having chosen a particular no-regret algorithm. We develop a control-theoretic lens in which a learning dynamic model is modeled as a cascade interconnection between a diagonal LTI map $G(s)=g(s)I_n$ and the softmax nonlinearity, linking the frequency response $g(j\omega)$ (gain and phase) directly to asymptotic performance. We introduce payoff-based higher-order variants of replicator dynamics, anticipatory/predictive replicator dynamics, and show that the anticipatory model is dynamically equivalent to predictive replicator dynamics with a first-order low-pass predictor. An oracle (perfect-prediction) variant is proved to uniformly dominate the standard replicator dynamics, i.e., it achieves higher cumulative reward at every time horizon, across all environments. Using passivity, we cast the performance comparison as a passivity question: passivity of an associated comparison system is equivalent to uniform dominance of one learning algorithm over another. This yields several free-lunch results: predictive exponential replicator dynamics with a low-pass predictor uniformly dominates the standard exponential replicator dynamics for any payoff trajectory; moreover, any predictive replicator with a passive, asymptotically stable predictor, including anticipatory replicator dynamics, locally dominates the standard replicator. Framing the global comparison between anticipatory and standard replicator as an optimal-control problem, we show the minimal achievable performance gap is zero, implying uniform dominance of the anticipatory model across all environments. Lastly, we derive closed-form expressions for the long-run average reward and limiting strategy of replicator dynamics in arbitrary $2\pi$-periodic environments. In the second part, the focus shifts from the interaction of a single learner with a dynamic environment to the interaction among multiple learners within a game. We establish a connection between finite regret and equilibrium-independent passivity (EI–passivity) through Best–Response Stationarity (BRS). Modeling the interaction between a learning dynamic (mapping payoffs to strategies) and a game (mapping strategies to payoffs) as a feedback interconnection, we exploit the fact that contractive games are anti–incrementally passive to show that incremental passivity is a stronger notion that implies both $\delta$–passivity and EI–passivity. Based on this connection, we develop a passivity-based classification of learning dynamics according to the passivity notion they satisfy—namely, incremental passivity, $\delta$–passivity, and EI–passivity—and use this classification as a framework for convergence analysis in contractive games. More generally, we develop an incremental-stability analysis for payoff-based higher-order variants of replicator dynamics in matrix contractive games. Taken together, the results of this thesis provide a unified control-theoretic framework for analyzing and comparing the performance of online learning dynamics. Beyond the theoretical significance, these results bridge control theory, online learning, and game theory, offering concepts that can guide the design of stable, efficient, and robust autonomous learning systems operating in interactive, uncertain, and multi-agent environments.

Graduation Semester

2025-12

Type of Resource

Thesis

Handle URL

https://hdl.handle.net/2142/132762

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Passivity, no-regret, and performance in online learning and games

Abdelraouf, Hassan

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Aerospace Engineering

Log In