Improving algorithmic performance using stochastic learning rates: In-expectation and almost-surely stochastic approximation, and online learning, results and applications
Mamalis, Theodoros
Loading…
Permalink
https://hdl.handle.net/2142/124240
Description
Title
Improving algorithmic performance using stochastic learning rates: In-expectation and almost-surely stochastic approximation, and online learning, results and applications
Author(s)
Mamalis, Theodoros
Issue Date
2024-04-10
Director of Research (if dissertation) or Advisor (if thesis)
Voulgaris, Petros
Stipanovic, Dusan
Doctoral Committee Chair(s)
Voulgaris, Petros
Committee Member(s)
Liberzon, Daniel
Subhonmesh, Bose
Department of Study
Electrical & Computer Eng
Discipline
Electrical & Computer Engr
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
Machine Learning
Artificial Intelligence
Optimization Algorithms
Minimization
Stochastic Systems
Training Data
Testing Data
Upper Bound
Empirical Loss Measurement
Learning Rate
Loss Function
Convergence
Language
eng
Abstract
In this work, multiplicative stochasticity is applied to the learning rate of stochastic optimization algorithms, giving rise to stochastic learning-rate schemes. In-expectation theoretical convergence results of Stochastic Gradient Descent (SGD) equipped with this novel stochastic learning rate scheme under the stochastic setting, as well as convergence results under the online optimization settings are provided. Empirical results consider the case of an adaptively uniformly distributed multiplicative stochasticity and include not only Stochastic Gradient Descent, but also other popular algorithms equipped with a stochastic learning rate. They demonstrate noticeable optimization performance gains, with respect to their deterministic-learning-rate versions. Under this stochastic learning rate framework, the theoretical almost sure (a.s.) convergence rates of the Stochastic Heavy Ball (SHB) algorithm in the convex and smooth, and the SGD algorithms in the nonconvex and smooth settings are investigated. In specific, it is shown that the a.s. convergence rates for both of these algorithms are accelerated when a stochastic-learning rate scheme satisfying certain criteria is used as opposed to a traditional deterministic-learning rate scheme. The reason for this acceleration is the multiplicative stochasticity which, mainly through its first moment but also its variance, beneficially affects the a.s. convergence rates.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.