IDEALS Home University of Illinois at Urbana-Champaign logo The Alma Mater The Main Quad

Energy-efficient latency tolerance for 1000-core data parallel processors with decoupled strands

Show full item record

Bookmark or cite this item: http://hdl.handle.net/2142/34589

Files in this item

File Description Format
PDF Crago_Neal.pdf (3MB) Restricted Access (no description provided) PDF
Title: Energy-efficient latency tolerance for 1000-core data parallel processors with decoupled strands
Author(s): Crago, Neal
Director of Research: Patel, Sanjay J.
Doctoral Committee Chair(s): Patel, Sanjay J.
Doctoral Committee Member(s): Hwu, Wen-mei W.; Lumetta, Steven S.; Chen, Deming
Department / Program: Electrical & Computer Eng
Discipline: Electrical & Computer Engr
Degree Granting Institution: University of Illinois at Urbana-Champaign
Degree: Ph.D.
Genre: Dissertation
Subject(s): Parallel Processing Data-parallel Graphics processing unit (GPU) General-purpose computing on graphics processing units (GPGPU) manycore latency tolerance decoupled architecture compiler technique energy-efficiency power-efficiency high-performance low power low energy
Abstract: This dissertation presents a novel decoupled latency tolerance technique for 1000-core data parallel processors. The approach focuses on developing instruction latency tolerance to improve performance for a single thread. The main idea behind the approach is to leverage the compiler to split the original thread into separate memory-accessing and memory-consuming instruction streams. The goal is to provide latency tolerance similar to high-performance techniques such as out-of-order execution while leveraging low hardware complexity similar to an in-order execution core. The research in this dissertation supports the following thesis: Pipeline stalls due to long exposed instruction latency are the main performance limiter for cached 1000-core data parallel processors. Leveraging natural decoupling of memory-access and memory-consumption, a serial thread of execution can be partitioned into strands providing energy-efficient latency tolerance. This dissertation motivates the need for latency tolerance in 1000-core data parallel processors and presents decoupled core architectures as an alternative to currently used techniques. This dissertation discusses the limitations of prior decoupled architectures, and proposes techniques to improve both latency tolerance and energy-efficiency. Finally, the success of the proposed decoupled architecture is demonstrated against other approaches by performing an exhaustive design space exploration of energy, area, and performance using high-fidelity performance and physical design models.
Issue Date: 2012-09-18
URI: http://hdl.handle.net/2142/34589
Rights Information: Copyright 2012 Neal Crago
Date Available in IDEALS: 2012-09-18
Date Deposited: 2012-08
 

This item appears in the following Collection(s)

Show full item record

Item Statistics

  • Total Downloads: 1
  • Downloads this Month: 0
  • Downloads Today: 0

Browse

My Account

Information

Access Key