Files in this item



application/pdfmain.pdf (2MB)
(no description provided)PDF


Title:Improving Per-Thread Performance on CMPs through Timing Speculation
Author(s):Greskamp, Brian
Doctoral Committee Chair(s):Torrellas, Josep
Doctoral Committee Member(s):Borkar, Shekhar; Chen, Deming; Patel, Sanjay J.; Zilles, Craig
Contributor(s):Karpuzcu, Rahmet U.; Cook, Jeffrey J.; Wan, Lu
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):chip multiprocessor
timing speculation
Abstract:The future of performance scaling lies in massively parallel workloads, but less-parallel applications will remain important. Unfortunately, future process technologies and core microarchitectures no longer promise major per-thread performance improvements, so microarchitects must find new ways to address a growing per-thread performance deficit. Moreover, they must do so without sacrificing parallel throughput. To meet these apparently conflicting demands, this dissertation proposes a Timing Speculation (TS) system for CMPs that boosts core clock frequencies past their normal limits when an application demands per-thread performance and operates efficiently at nominal frequency when it demands throughput. This work's contributions are organized into three interlocking proposals. This work begins by introducing Paceline, the first TS microarchitecture designed specifically for CMPs. Paceline enables two cores to work together to execute a single thread at high speed under TS or independently to execute two threads at the rated frequency. In single-thread mode, one core in the pair --- the ``Leader'' --- executes at higher-than-normal frequency, while a ``Checker'' runs at the rated, safe frequency. The Leader runs the program faster but may experience timing errors. To detect and correct these errors, the Checker periodically compares a hash of its architectural state with that of the Leader. The Leader helps the Checker keep up by passing it branch results and prefetches. Next, this dissertation enhances Paceline with BlueShift, a circuit design method for TS architectures that improves a circuit's common-case delay rather than focusing on worst-case delay like traditional design flows. BlueShift profiles a gate-level design as it runs real benchmark applications to identify the frequently-exercised circuit paths and then applies speed optimizations to those paths only. These optimizations can be implemented in a way that can be enabled and disabled at run-time so that they do not exact a power cost when they are not needed (ie. when the processor is executing a throughput workload). Finally, this work presents LeadOut, a CMP design that combines Paceline with an additional per-thread performance enhancement: the ability to increase core supply voltage above nominal. LeadOut evaluates the performance gains that are possible with Paceline alone, voltage boosting alone, and both together. It shows major gains from applying the two techniques together when feasible and also shows that, in many cases, future CMPs have power and temperature headroom to exploit still more per-thread enhancements as long as they can be enabled and disabled dynamically according to application demand.
Issue Date:2009-07-23
Genre:Dissertation / Thesis
Publication Status:unpublished
Peer Reviewed:not peer reviewed
Date Available in IDEALS:2009-07-23

This item appears in the following Collection(s)

Item Statistics