|Abstract:||The importance of tiles or blocks in mathematics and thus computer science cannot be overstated. From a high level point of view, they are the natural way to express many algorithms, both in iterative and recursive forms. Tiles or sub-tiles are used as basic units in the algorithm description. From a low level point of view, tiling, either as the unit maintained by the algorithm, or as a class of data layouts, is one of the most effective ways to exploit locality, which is a must to achieve good performance in current computers given the growing gap between memory and processor speed. Finally, tiles and operations on them are also basic to express data distribution and parallelism.
Despite the importance of this concept, which makes inevitable its widespread usage, most languages do not support it directly. Programmers have to understand and manage the low-level details along with the introduction of tiling. This gives place to bloated potentially error-prone programs in which opportunities for performance are lost. On the other hand, the disparity between the algorithm and the actual implementation enlarges.
This thesis illustrates the power of Hierarchically Tiled Arrays (HTAs), a data type which enables the easy manipulation of tiles in object-oriented languages. The objective is to evolve this data type in order to make the representation of all classes for algorithms with a high degree of parallelism and/or locality as natural as possible. We show in the thesis a set of tile operations which leads to a natural and easy implementation of different algorithms in parallel and in sequential with higher clarity and smaller size.
In particular, two new language constructs dynamic partitioning and overlapped tiling are discussed in detail. They are extensions of the HTA data type to improve its capabilities to express algorithms with a high abstraction and free programmers from programming tedious low-level tasks.
To prove the claims, two popular languages, C++ and MATLAB are extended with our HTA data type. In addition, several important dense linear algebra kernels, stencil computation kernels, as well as some benchmarks in NAS benchmark suite were implemented. We show that the HTA codes needs less programming effort with a negligible effect on performance.