|Abstract:||Complex media applications are becoming increasingly common on general-purpose systems such as desktop, laptop, and handheld computers. However, real-time execution of such applications needs a considerable amount of processing power that often surpasses the capabilities of current superscalar processors. Further, high performance processors are often constrained by power and energy consumption, especially in the mobile systems where media applications have become popular.
The objective of this dissertation is to develop general-purpose processors that can meet the performance demands of future media applications in an energy-efficient way, while also continuing to work well on other common workloads for desktop, laptop, and handheld systems. Fortunately, most media applications have multiple types of parallelism: thread-level, data-level, and instruction-level parallelism (TLP/DLP/ILP). In this work, we investigate exploiting these three forms of parallelism to provide both high performance and energy efficiency.
This dissertation makes three broad contributions. First, we analyze the parallelism in complex media applications and make the case that contemporary media applications require efficient support for multiple types of parallelism, including ILP, TLP, and various forms of data-level parallelism such as sub-word SIMD, short vectors, and streams. Second, to find the most energy efficient way of exploiting TLP, we perform a comparison between chip multi-processing (CMP) and simultaneous multi-threading (SMT). Finally, we propose a complete architecture, called ALP, that effectively supports all levels of parallelism described above in an energy efficient way, using an evolutionary programming model and hardware. The most novel part of ALP is a DLP technique called SIMD vectors and streams, which is integrated within a conventional superscalar based CMP/SMT architecture with sub-word SIMD. This technique lies between sub-word SIMD and vectors, providing significant benefits over the former at a lower cost than the latter. Our evaluations show that each form of parallelism supported by ALP is important.
More broadly, our results show that conventional architectures augmented with evolutionary mechanisms can provide high performance and energy savings for complex media applications without resorting to radically different architectures and programming paradigms.