Files in this item



application/pdfYUAN-THESIS-2019.pdf (698kB)Restricted Access
(no description provided)PDF


Title:Accelerating distributed neural network training with network-centric approach
Author(s):Yuan, Yifan
Advisor(s):Kim, Nam Sung
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):distributed training
Abstract:Distributed training of Deep Neural Networks (DNN) is an important technique to reduce the training time of large DNNs for a wide range of applications. In existing distributed training approaches, however, the communication time to periodically exchange parameters (i.e., weights) and gradients among computer nodes over the network constitutes a large fraction of the total training time. To reduce the communication time, we propose an algorithm/hardware co-design, INCEPTIONN. More specifically, observing that gradients are much more tolerant to precision loss than parameters, we first propose a gradient-centric distributed training algorithm. As designed to exchange only gradients among nodes in a distributed manner, it can transfer less information, better overlap communication with computation, and apply a more aggressive lossy compression algorithm to all the information exchanged among nodes than traditional distributed algorithms. Second, exploiting unique characteristics of gradient values, we propose a lossy compression algorithm, optimized for compressing gradients. It accomplishes high compression ratios for compressing gradients without notably affecting the accuracy of trained DNNs. Lastly, we demonstrate that compression algorithms consume a large amount of CPU time, which in turn increases total training time albeit reduced communication time. To tackle this, we propose an in-network computing approach that delegates the lossy compression task to hardware integrated with a Network Interface Card (NIC). Our experiments show that INCEPTIONN can reduce a large portion of the communication time and thus the training time of DNNs, with little degradation in accuracy of trained DNNs.
Issue Date:2019-10-17
Rights Information:Copyright 2019 Yifan Yuan
Date Available in IDEALS:2020-03-02
Date Deposited:2019-12

This item appears in the following Collection(s)

Item Statistics