Files in this item

FilesDescriptionFormat

application/pdf

application/pdfZHANG-THESIS-2015.pdf (5MB)
(no description provided)PDF

Description

Title:Performance analysis and optimization of a CFD application
Author(s):Zhang, Wentao
Advisor(s):Bodony, Daniel J.
Department / Program:Mechanical Science & Engineering
Discipline:Mechanical Engineering
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:M.S.
Genre:Thesis
Subject(s):Performance optimization
computational fluid dynamics (CFD)
Intel Xeon Phi
Abstract:This thesis documents the analysis and optimization of a high-order finite difference computational fluid dynamics (CFD) application (PlasComCM). Performance bottlenecks were identified using performance tools and hardware counters. The performance analysis of PlasComCM showed that the quantity of memory accesses and the lack of vectorization inhibited optimal serial performance on a x86-based CPU. Optimizing techniques including pointer dereferencing, loop transformation and Fortran SIMD directives were applied to the top 10 time-consuming subroutines to remove obstacles to vectorization and to improve the serial performance. Details about the optimization techniques are presented and their impacts on performance are discussed. A 63% reduction in the number of memory loads and a serial speedup of 2.02 were obtained from the optimization efforts. Using the optimized serial program as the codebase, further investigation was focused on the analysis and optimization of parallel heterogeneous execution on a dual-socket node fitted with an Intel Xeon Phi MIC card. To reduce the overhead created by host-accelerator copies in heterogeneous execution, the data layout of the halo region was changed from a ''star'' shape to a ''box'' shape to agglomerate small communications and to create a larger work granularity. Preliminary results of running PlasComCM on Intel Xeon Phis in symmetric mode are also presented, where it was found that a 20% reduction in wall-clock time can be obtained for particular problem size when using 2 SandyBridge sockets + 1 Phi card vs 2 SandyBridge sockets.
Issue Date:2015-07-20
Type:Thesis
URI:http://hdl.handle.net/2142/88072
Rights Information:Copyright 2015 Wentao Zhang
Date Available in IDEALS:2015-09-29
Date Deposited:August 201


This item appears in the following Collection(s)

Item Statistics