Statistical and algorithmic foundation of K-means clustering

Zhuang, Yubo

Statistical and algorithmic foundation of K-means clustering

Zhuang, Yubo

This item's files can only be accessed by the System Administrators group.

Permalink

https://hdl.handle.net/2142/132789

Description

Title

Statistical and algorithmic foundation of K-means clustering

Author(s)

Zhuang, Yubo

Issue Date

2025-12-04

Director of Research (if dissertation) or Advisor (if thesis)

Yang, Yun

Doctoral Committee Chair(s)

Liang, Feng

Committee Member(s)

Chen, Xiaohui
Liu, Jingbo

Department of Study

Statistics

Discipline

Statistics

Degree Granting Institution

University of Illinois Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Unsupervised learning, clustering, optimization, semidefinite programming

Language

eng

Abstract

Clustering is a widely deployed unsupervised learning tool. Given data in the Euclidean spoace, K-means clustering is one of the most commonly used clustering methods, which minimize the distance between each point to the centroid of its assigned cluster. Among the popular clustering methods, SDP clustering enjoys the strongest statistical guarantees under the standard Gaussian mixture models in that it achieves an information-theoretic bound for exact recovery. However, the original SDP method is limited to isotropic covariance matrices for Gaussians, and it has prohibitively high costs of solving the SDP optimization problem. This project wants to develop algorithms to improve the computational efficiency and extend the results to more general cases in the following aspects: Extend the algorithms and results to heterogeneous data as well as other types of data like distributions or measures; develop algorithms to enhance the computational performance for SDP or to efficiently solve the SDP for clustering; propose 1-st order and 2-nd order methods to solve general non-negative SDP optimization problems with minimal assumptions, which can be applied to various hidden community detection tasks.

Graduation Semester

2025-12

Type of Resource

Thesis

Handle URL

https://hdl.handle.net/2142/132789

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Statistical and algorithmic foundation of K-means clustering

Zhuang, Yubo

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Statistics

Log In