Withdraw
Loading…
Comparison of distributed training architecture for convolutional neural network in cloud
Shi, Dongwei
Loading…
Permalink
https://hdl.handle.net/2142/105276
Description
- Title
- Comparison of distributed training architecture for convolutional neural network in cloud
- Author(s)
- Shi, Dongwei
- Issue Date
- 2019-04-26
- Director of Research (if dissertation) or Advisor (if thesis)
- Hwu, Wen-Mei
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- M.S.
- Degree Level
- Thesis
- Date of Ingest
- 2019-08-23T20:48:28Z
- Keyword(s)
- Deep Learning, Distributed System
- Abstract
- The rapid growth of data and ever increasing model complexity of deep neural networks (DNNs) have enabled breakthroughs in various artificial intelligence fields such as computer vision, natural language processing and data mining. The training process of the DNN is a computationally intensive application that can be accelerated by parallel computing devices such as graphic processing units (GPUs) and field programmable gate arrays (FP- GAs). However, sometimes the amount of the training data or the size of the model might exceed what can be efficiently trained or loaded by a single machine. Distributed deep learning training addresses this issue by spreading the computations over several machines. Due to the internode communication and other overheads in distributed computing infrastructure, the performance improvements are not directly proportional to the number of machines. This thesis will study the computation time, memory, bandwidth, and other resources that are required to perform distributed deep learning. The approach of this work is to implement and deploy several data parallelism distributed deep learning algorithms on Google Cloud Platform (GCP) and then to analyze the performance and compare the communication over- head between different algorithms. The results obtained in this research yield the Ring All-Reduce architecture, a bandwidth-optimal communication operation used for distributed deep learning, which outperforms the Parameter Server architecture, a many-to-one architecture, on scalability. In addition, system usage information reported from GCP is leveraged to identify the bottleneck of a neural network training on distributed architecture.
- Graduation Semester
- 2019-05
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/105276
- Copyright and License Information
- Copyright 2019 Dongwei Shi
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Electrical and Computer Engineering
Dissertations and Theses in Electrical and Computer EngineeringManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…