Files in this item



application/pdfMukherjee_Jayanta.pdf (881kB)
(no description provided)PDF


application/x-texAppendix.tex (9kB)
(no description provided)TeX


application/x-tex1-introduction.tex (4kB)
(no description provided)TeX


application/x-tex2-related.tex (7kB)
(no description provided)TeX


application/x-tex3-exp.tex (9kB)
(no description provided)TeX


application/x-tex4-opts.tex (10kB)
(no description provided)TeX


application/octet-streamDendroRef.bib (10kB)
(no description provided)Unknown


Title:Performance evaluation and enhancement of Dendro
Author(s):Mukherjee, Jayanta
Advisor(s):Gropp, William D.
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Performance modeling
performance tuning
Abstract:DENDRO is a collection of tools for solving Finite Element problems in parallel. This package is written in C++ using the standard template library (STL) and uses the Message Passing (MPI). Dendro uses an octree data-structure to solve image-registration problems using finite element techniques. For analyzing the behavior of the package in terms of speed-up and scalability, it is important to know which part of the package is consuming most of the execution-time. The single node performance and the overall performance of the package is dependent on the code-organization and class-hierarchy. We used the PETSC profiler to collect the performance statistics and instrument the code to know which part of the code takes most of the time. Along with the function-specific execution timings, PETSC profiler also provides the information regarding how many floating point operations is being performed in total and on average (FLOP/second). PETSC also provides information related to memory usage and number of MPI messages and reductions being performed to execute that particular function. We have analyzed these performance-statistics to provide some guidelines to how we can make Dendro more efficient by optimizing certain functions. We obtained around 12X speedup over the performance of (default) Dendro by using compiler-provided optimizations and achieved more than 65% speedup over compiler optimized performance (20X over the naive Dendro performance) by manually tuning some-block of code along with the compiler-optimizations.
Issue Date:2011-01-14
Rights Information:2010 by Jayanta Mukherjee. All rights reserved.
Date Available in IDEALS:2011-01-14
Date Deposited:December 2

This item appears in the following Collection(s)

Item Statistics