Files in this item



application/pdfXIE-DISSERTATION-2021.pdf (1MB)
(no description provided)PDF


Title:Toward communication-efficient and secure distributed machine learning
Author(s):Xie, Cong
Director of Research:Koyejo, Oluwasanmi; Gupta, Indranil
Doctoral Committee Chair(s):Koyejo, Oluwasanmi; Gupta, Indranil
Doctoral Committee Member(s):Raginsky, Maxim; McMahan, Hugh Brendan
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
machine learning
Abstract:In recent years, there is an increasing interest in distributed machine learning. On one hand, distributed machine learning is motivated by assigning the training workload to multiple devices for acceleration and better throughput. On the other hand, there are machine-learning tasks requiring distributed training locally on remote devices due to privacy concerns. Stochastic Gradient Descent (SGD) and its variants are commonly used for training large-scale deep neural networks, as well as the distributed training. Unlike machine learning on a single device, distributed machine learning requires collaboration and communication among the devices, which incur the new challenges: 1) the heavy communication overhead can be the bottleneck that slows down the training; 2) the unreliable communication and weaker control over the remote entities makes the distributed system vulnerable to systematic failures and malicious attacks. In this dissertation, we aim to find new approaches to make distributed SGD faster and more secure. We present four main parts of research. We first study approaches for reducing the communication overhead, including message compression and infrequent synchronization. Then, we investigate the possibility of combining asynchrony with infrequent synchronization. To address security in distributed SGD, we study the tolerance to Byzantine failures. Finally, we explore the possibility of combining both communication efficiency and security techniques into one distributed learning system. Specifically, we present the following techniques to improve the communication efficiency and security of distributed SGD: 1) a technique called "error reset" to adapt both infrequent synchronization and message compression to distributed SGD, to reduce the communication overhead; 2) federated optimization in asynchronous mode; 3) a framework of score-based approaches for Byzantine tolerance in distributed SGD; 4) a distributed learning system integrating all these three techniques. The proposed system provides communication reduction, both synchronous and asynchronous training, and Byzantine tolerance, with both theoretical guarantees and empirical evaluations.
Issue Date:2021-04-14
Rights Information:Copyright 2021 Cong Xie
Date Available in IDEALS:2021-09-17
Date Deposited:2021-05

This item appears in the following Collection(s)

Item Statistics