Files in this item

FilesDescriptionFormat

application/pdf

application/pdfSHI-THESIS-2019.pdf (2MB)Restricted Access
(no description provided)PDF

Description

Title:Comparison of distributed training architecture for convolutional neural network in cloud
Author(s):Shi, Dongwei
Advisor(s):Hwu, Wen-mei
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:M.S.
Genre:Thesis
Subject(s):Deep Learning, Distributed System
Abstract:The rapid growth of data and ever increasing model complexity of deep neural networks (DNNs) have enabled breakthroughs in various artificial intelligence fields such as computer vision, natural language processing and data mining. The training process of the DNN is a computationally intensive application that can be accelerated by parallel computing devices such as graphic processing units (GPUs) and field programmable gate arrays (FP- GAs). However, sometimes the amount of the training data or the size of the model might exceed what can be efficiently trained or loaded by a single machine. Distributed deep learning training addresses this issue by spreading the computations over several machines. Due to the internode communication and other overheads in distributed computing infrastructure, the performance improvements are not directly proportional to the number of machines. This thesis will study the computation time, memory, bandwidth, and other resources that are required to perform distributed deep learning. The approach of this work is to implement and deploy several data parallelism distributed deep learning algorithms on Google Cloud Platform (GCP) and then to analyze the performance and compare the communication over- head between different algorithms. The results obtained in this research yield the Ring All-Reduce architecture, a bandwidth-optimal communication operation used for distributed deep learning, which outperforms the Parameter Server architecture, a many-to-one architecture, on scalability. In addition, system usage information reported from GCP is leveraged to identify the bottleneck of a neural network training on distributed architecture.
Issue Date:2019-04-26
Type:Thesis
URI:http://hdl.handle.net/2142/105276
Rights Information:Copyright 2019 Dongwei Shi
Date Available in IDEALS:2019-08-23
Date Deposited:2019-05


This item appears in the following Collection(s)

Item Statistics