Files in this item

FilesDescriptionFormat

application/pdf

application/pdfSadeghiBaghsorkhi_Sara.pdf (1MB)
(no description provided)PDF

Description

Title:Efficient Performance Evaluation for Highly Multi-threaded Graphics Processors
Author(s):Sadeghi Baghsorkhi, Sara
Director of Research:Hwu, Wen-Mei W.
Doctoral Committee Chair(s):Hwu, Wen-Mei W.
Doctoral Committee Member(s):Gropp, William D.; Navarro, Nacho; Padua, David A.; Patel, Sanjay J.
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:Ph.D.
Genre:Dissertation
Subject(s):GPU computing
performance evaluation
memory hierarchy
Graphics Processing Unit (GPU)
Abstract:With the emergence of highly multithreaded architectures, an effective performance monitoring system must reflect the interaction between a large number of concurrent events, and associate the overall effect of individual events and inefficiencies to the operations in the application source code. The state-of-the-art performance counters in highly multithreaded graphic processors currently do not provide this level of precision. Although fine-grained sampling of performance counters after each source-level operation could potentially achieve the desired precision, the high frequency of sampling required will likely cause too much distortion to the actual application behavior and make the sampled counter values inaccurate. In this thesis, I present a novel software-based approach for monitoring the memory hierarchy performance in highly multithreaded general-purpose graphics processors. The proposed analysis is based on memory traces collected for small snapshots of application execution. A trace-based memory hierarchy model with a Monte Carlo experimental methodology generates statistical bounds of performance measures in the presence of nonuniform thread interleaving and data sharing in a highly multithreaded execution environment. The statistical approach overcomes the classical problem of disturbed execution timing due to instrumentation. The approach scales well as I deploy a minimal sampling technique to reduce the trace generation overhead and model simulation time. The proposed scheme also keeps track of individual memory operations in the source code and can quantify the amount of their contribution to detrimental effects on memory system performance. A cross-validation of the model results shows close agreement with the values read from the hardware performance counters on an NVIDIA Tesla C2050. I later use the predicted memory hierarchy performance statistics in an analytical model to identify performance characteristics of a kernel and its expected execution time. To account for the systematic error present in the predictions, I approximate the error function and express a range of potential true execution times for each predicted value.
Issue Date:2011-08-26
URI:http://hdl.handle.net/2142/26373
Rights Information:Copyright 2011 Sara Sadeghi Baghsorkhi
Date Available in IDEALS:2013-08-27
Date Deposited:2011-08


This item appears in the following Collection(s)

Item Statistics