|Abstract:||The rapid development of modern information technology has significantly facilitated the generation, collection, transmission and storage of all kinds of data. With the so-called “big data” generated in an unprecedented rate, we are facing significant challenges in learning knowledge from it. Traditional machine learning algorithms often suffer from the unmatched volume and complexity of such big data, however, sparsity has been recently studied to tackle this challenge. With reasonable assumptions and effective utilization of sparsity, we can learn models that are simpler, more efficient and robust to noise.
The goal of this dissertation is studying and exploiting sparsity to design learning algorithms to effectively and efficiently solve various challenging and significant real-world machine learning tasks. I will integrate and introduce my work from three different perspectives: sample complexity, computational complexity, and noise reduction. Intuitively, these three aspects correspond to models that require less data to learn, are more computationally efficient, and still perform well when the data is noisy. Specifically, this thesis is integrated from the three aspects as follows:
First, I focus on the sample complexity of machine learning algorithms for an important machine learning task, compressed sensing. I propose a novel algorithm based on nonconvex sparsity-inducing penalty, which is the first work that utilizes such penalty. I also prove that our algorithm improves the best previous sample complexity significantly by extensive theoretical derivation and numerical experiments.
Second, from the perspective of computational complexity, I study the expectation-maximization (EM) algorithms in high dimensional scenarios. In contrast to the conventional regime, the maximization step (M-step) in high dimensional scenario can be very computationally expensive or even not well defined. To address this challenge, I propose an efficient algorithm based on novel semi-stochastic gradient descent with variance reduction, which naturally incorporates the sparsity in model parameters, greatly economizes the computational cost at each iteration and enjoys faster convergence rates simultaneously. We believe the proposed unique semi-stochastic variance-reduced gradient is of general interest of nonconvex optimization of bivariate structure.
Third, I look into the noise reduction problem and target on an important text mining task, event detection.
To overcome the noise in the text data which hampers the detection of real events, I design an efficient algorithm based on sparsity-inducing fused lasso framework. Experiment results on various datasets show that our algorithm effectively smooths out noises and captures the real event, outperforming several state- of-the-art methods consistently in noisy setting.
To sum up, this thesis focuses on the critical issues of machine learning in big data from the perspective of sparsity in the data and model. Our proposed methods clearly show that utilizing sparsity is of great importance for various significant machine learning tasks.