Files in this item



application/pdfEfficient Support for Speculative Tasking.pdf (991kB)
(no description provided)PDF


Title:Efficient Support for Speculative Tasking
Author(s):Tuck, James M.
computer science
Abstract:Improving application performance is a major challenge for computer architects. Two important reasons for it are the shift to multi-core architectures, which will no longer emphasize improvements in instruction-level parallelism, and long memory latencies. A versatile primitive for overcoming these obstacles is Speculative Tasking. With Speculative Tasking (ST), the outcome of a long or risky operation is assumed to be known, thereby allowing the execution of the following code section --- potentially in parallel ---by taking a hardware checkpoint and buffering the speculative state. Later, if it turns out that the assumption was incorrect, the hardware rolls back the whole section to the checkpoint and re-executes it transparently. ST has been studied in depth and has been shown to improve application performance. However, when applying ST to existing systems important problems and concerns become apparent. This dissertation considers three systems and three new problems facing Speculative Tasking. The first system studied is a checkpointed processor that uses a task to speculate past cache misses rather than stalling; if the speculation is successful, the processor can potentially hide the latency of the memory access and boost Memory-Level Parallelism (MLP). However, the boost in MLP substantially increases the number of in-flight memory operations to the extent that conventional memory hierarchies are unsuited to support them --- they need to be redesigned to support 1-2 orders of magnitude more outstanding misses. Yet, designing scalable MHAs is challenging: designs must minimize cache lock-up time and deliver high bandwidth while keeping the area consumption reasonable. Hence, a novel scalable MHA design for high-MLP processors is proposed that introduces two main innovations. First, it is hierarchical, with a small MSHR file per cache bank, and a larger MSHR file shared by all banks. Second, it uses a Bloom filter to reduce searches in the larger MSHR file. The result is a high-performance, area-efficient design. Compared to a state-of-the-art MHA on a high-MLP processor, the proposed design speeds-up some SPECint, SPECfp, and multiprogrammed workloads by a geometric mean of 32%, 50%, and 95%, respectively. The second system studied is Speculative Multithreading (SM) on a Chip Multiprocessor (CMP). While it has the ability to speed-up hard-to-parallelize applications, the power inefficiency of aggressive speculation is a concern. To improve power efficiency, I note that not all the tasks that are running in such an environment are equally critical. To leverage these insights, a widely-applicable, novel task-criticality model for SM is developed for analyzing SM programs. Then, an architecture is proposed that (i) uses this model to analyze and predict the criticality of tasks in a SM application at run-time, and (ii) uses criticality to schedule tasks on a SM CMP for power efficient execution. Experiments with SPECint, SPECfp, and Olden show that CAP reduces E x D^2 by 9%, 21%, and 23% on average respectively. Finally, recent proposals for Speculative Tasking have called for signatures in hardware to accelerate memory disambiguation. The power of signatures lie in their ability to represent a set of addresses concisely while allowing for set operations directly on the signatures. As a result, they make costly set operations cheap to perform in hardware. To take full advantage of signatures, this thesis presents SoftBulk, a novel architecture that exposes signatures to software directly through the instruction set. Using SoftBulk, programs can collect information about their own memory access patterns and use that information for a variety of purposes including code optimization and debugging.
Issue Date:2007-08
Genre:Technical Report
Other Identifier(s):UIUCDCS-R-2007-2871
Rights Information:You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format, BUT this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the University of Illinois at Urbana-Champaign Computer Science Department under terms that include this permission. All other rights are reserved by the author(s).
Date Available in IDEALS:2009-04-22

This item appears in the following Collection(s)

Item Statistics