Files in this item
Files | Description | Format |
---|---|---|
application/pdf ![]() | (no description provided) |
Description
Title: | Histogram sort with sampling |
Author(s): | Vipul Harsh, - |
Advisor(s): | Kale, Laxmikant |
Department / Program: | Computer Science |
Discipline: | Computer Science |
Degree Granting Institution: | University of Illinois at Urbana-Champaign |
Degree: | M.S. |
Genre: | Thesis |
Subject(s): | Parallel sorting
Data partitioning Sample sort Histogram sort |
Abstract: | Standard parallel sorting algorithms like sample sort rely on data partitioning techniques to distribute keys across processors. The sampling cost in sample sort for good load balance is prohibitive for massive clusters. We describe Histogram sort with sampling, an adaptation of the popular Histogram sort algorithm. We show that Histogram sort with sampling has sound theoretical guarantees and reduces the sample size requirements from O(p log N/epsilon^2) to O(k p sqrt[k]{log p/epsilon}) with k rounds of histogramming w.h.p.. Histogram sort with sampling is more efficient than Sample sort algorithms that achieve the same level of load balance, both in theory and practice, especially for massively parallel applications, scaling to tens of thousands of processors. We also show that an approximate but fairly accurate histogram can be obtained using a O( sqrt {p log N}/epsilon) sample on every processor. This can be used to speed up the histogramming step and can be of independent interest for answering general queries in large parallel processing systems. In our practical implementation, we exploit shared memory within nodes to improve the performance of our algorithm on large modern clusters. |
Issue Date: | 2017-07-05 |
Type: | Text |
URI: | http://hdl.handle.net/2142/98144 |
Rights Information: | Copyright 2017 Vipul Harsh |
Date Available in IDEALS: | 2017-09-29 |
Date Deposited: | 2017-08 |
This item appears in the following Collection(s)
-
Dissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer Science -
Graduate Dissertations and Theses at Illinois
Graduate Theses and Dissertations at Illinois