Files in this item



application/pdfAnshu_Arya.pdf (2MB)
(no description provided)PDF


Title:Optimization of FFT communication on 3-D torus and mesh supercomputer networks
Author(s):Arya, Anshu
Advisor(s):Kale, Laxmikant V.
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Fast Fourier Transform (FFT)
Abstract:The fast Fourier transform (FFT) is of intense interest to the scientific community. Its utility in a vast range of parallel scientific simulations warrants investigating its efficiency on a leading class of supercomputers that employ 3-D torus and mesh networks. Given the divide-and-conquer approach of the popular Cooley-Tukey approach, a parallel FFT can use a highly optimized serial algorithm for intra-node calculations, but the communication patterns between nodes reveal a potential for improvement. In this work, the scalability and communication patterns of global 1-D and 3-D parallel FFTs and how they map to 3-D torus and mesh networks are investigated. As a starting point, a naive implementation of a 1-D FFT is modified and scaled to 65,536 cores of a BlueGene/P machine. Insights into the machine architecture and network limits gained from the 1-D FFT are then applied to improve the performance of a 3-D FFT. Default mappings of a 3-D FFT onto a torus result in low network utilization and sub-optimal communication costs. The communication graph of a 3-D FFT is formulated and graph partitioning techniques are used to find a new mapping that greatly improves performance. Data migration techniques are then used to alleviate possible link contention associated with the new mappings. Migration techniques are discussed and modeled. It is discovered that specific migration patterns are able to reduce link contention without increasing the overall data movement (hop-bytes). With both mapping and migration techniques combined, a decrease of nearly 20% in the running time compared to the default 3-D FFT mapping is observed on 512 nodes, 2048 cores of a BlueGene/P machine.
Issue Date:2013-08-22
Rights Information:Copyright 2013 Anshu Arya
Date Available in IDEALS:2013-08-22
Date Deposited:2013-08

This item appears in the following Collection(s)

Item Statistics