Files in this item



application/pdfTENG-THESIS-2017.pdf (486kB)
(no description provided)PDF


Title:Mitigating Spark straggler tasks for iterative applications by data re-partitioning
Author(s):Teng, Bo
Advisor(s):Campbell, Roy H.
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Machine learning
Apache Spark
Iterative application
Abstract:Many of the data science applications nowadays feature large datasets and short tasks that run many iterations. When running these applications on a parallel processing framework like Apache Spark, one problem that affects the running time is the straggler, where a disproportionate long-running task slows down the entire cluster. In this work we present a straggler mitigation technique tailored for applications that run small tasks for many iterations over a large dataset, and implemented the algorithm in Apache Spark. We monitor the resources available on each Spark node, and dynamically re partition the dataset proportional to the estimated resource available. We have shown that our algorithm has negligible overhead for resource monitoring, and can improve the performance of Spark cluster significantly when stragglers are present.
Issue Date:2017-04-18
Rights Information:Copyright 2017 Bo Teng
Date Available in IDEALS:2017-08-10
Date Deposited:2017-05

This item appears in the following Collection(s)

Item Statistics