Files in this item



application/pdfRAINA-THESIS-2018.pdf (578kB)
(no description provided)PDF


Title:Optimizing interactive analytics engines for heterogeneous clusters
Author(s):Raina, Ashwini
Advisor(s):Gupta, Indranil
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Real-time analytics, data replication
Abstract:This thesis targets the growing area of interactive data analytics engines. It builds upon a system called Getafix, an intelligent data replication and placement algorithm, and optimizes Getafix for running mixed queries over a heterogeneous cluster. The new algorithm is called Getafix-H, a cluster aware version of Getafix replication algorithm, with built-in optimizations for segment balancing and cluster auto-tiering. We integrated Getafix-H as an extension to Getafix inside Druid, a modern open-source interactive data analytics engine. We present experimental results using workloads from Yahoo!’s production Druid cluster. Compared to Getafix, Getafix-H improves the tail latency by 18% and reduces memory usage by up to 27% (2-3X improvement over Scarlett). In presence of stragglers, Getafix-H improves tail latency by 55% and reduces memory usage by upto 20% compared to Getafix. Getafix-H enables sysadmins to auto-tier a heterogeneous cluster with the tiering accuracy of up to 80%.
Issue Date:2018-05-09
Rights Information:Copyright 2018 Ashwini Raina
Date Available in IDEALS:2018-09-27
Date Deposited:2018-08

This item appears in the following Collection(s)

Item Statistics