Nonlinear and geometric control methods in deep learning theory
Hanson, Joshua McKinley
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/127381
Description
Title
Nonlinear and geometric control methods in deep learning theory
Author(s)
Hanson, Joshua McKinley
Issue Date
2024-12-03
Director of Research (if dissertation) or Advisor (if thesis)
Raginsky, Maxim
Doctoral Committee Chair(s)
Raginsky, Maxim
Committee Member(s)
Baryshnikov, Yuliy
Belabbas, Mohamed Ali
Liberzon, Daniel
Department of Study
Electrical & Computer Eng
Discipline
Electrical & Computer Engr
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
Neural networks
statistical learning theory
Rademacher complexity
deep learning
nonlinear control
geometric control
Abstract
The practical success and popularity of deep neural networks for solving modern machine learning problems motivates us to develop a rigorous theoretical understanding of how depth affects model expressivity and generalizability. Interpreting deep learning architectures as control systems unlocks versatile tools from nonlinear systems theory to study these models and associated statistical learning problems. In this dissertation, we present three main technical works taking advantage of this perspective, which are summarized as follows: The first work describes an encoder-decoder architecture for learning immersed submanifolds from data inspired by the structure of the group action in Sussmann’s orbit theorem, which is built from composing forward- and backward-in-time flow maps. We proceed to develop generalization bounds for this model class and apply these results to a handful of illustrative examples. The second work investigates a technique for proving generalization bounds for neural ordinary differential equations based on transforming the model into an equivalent infinite-dimensional kernel machine through the use of the Chen–Fliess expansion, which expresses the model output as an infinite series in terms of signature integrals of the control and iterated Lie derivatives of the output map. This technique differs from strategies based on bounding the covering number of the model class by propagating a parameter perturbation through the flow map, and instead takes advantage of standard tools applicable to kernel machines. The third work focuses on deriving quantitative approximation error bounds for neural ordinary differential equations having at most quadratic nonlinearities in the dynamics. The simple dynamics of this model form demonstrates how expressivity can be derived primarily from iteratively composing many basic elementary operations, versus from the complexity of those elementary operations themselves. These results contribute to our understanding of what depth imparts to the capabilities of deep learning architectures.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.