Exploiting Locality and Parallelism with Hierarchically Tiled Arrays
- Exploiting Locality and Parallelism with Hierarchically Tiled Arrays
- Guo, Jia
- Issue Date
- computer science
- The importance of tiles or blocks in mathematics and thus computer science cannot be overstated. From a high level point of view, they are the natural way to express many algorithms, both in iterative and recursive forms. Tiles or sub-tiles are used as basic units in the algorithm description. From a low level point of view, tiling, either as the unit maintained by the algorithm, or as a class of data layouts, is one of the most effective ways to exploit locality, which is a must to achieve good performance in current computers given the growing gap between memory and processor speed. Finally, tiles and operations on them are also basic to express data distribution and parallelism. Despite the importance of this concept, which makes inevitable its widespread usage, most languages do not support it directly. Programmers have to understand and manage the low-level details along with the introduction of tiling. This gives place to bloated potentially error-prone programs in which opportunities for performance are lost. On the other hand, the disparity between the algorithm and the actual implementation enlarges. This thesis illustrates the power of Hierarchically Tiled Arrays (HTAs), a data type which enables the easy manipulation of tiles in object-oriented languages. The objective is to evolve this data type in order to make the representation of all classes for algorithms with a high degree of parallelism and/or locality as natural as possible. We show in the thesis a set of tile operations which leads to a natural and easy implementation of different algorithms in parallel and in sequential with higher clarity and smaller size. In particular, two new language constructs dynamic partitioning and overlapped tiling are discussed in detail. They are extensions of the HTA data type to improve its capabilities to express algorithms with a high abstraction and free programmers from programming tedious low-level tasks. To prove the claims, two popular languages, C++ and MATLAB are extended with our HTA data type. In addition, several important dense linear algebra kernels, stencil computation kernels, as well as some benchmarks in NAS benchmark suite were implemented. We show that the HTA codes needs less programming effort with a negligible effect on performance.
- Type of Resource
- Copyright and License Information
- You are granted permission for the non-commercial reproduction, distribution, display, and performance of this technical report in any format, BUT this permission is only for a period of 45 (forty-five) days from the most recent time that you verified that this technical report is still available from the University of Illinois at Urbana-Champaign Computer Science Department under terms that include this permission. All other rights are reserved by the author(s).
Edit Collection Membership