Files in this item



application/pdfXU-THESIS-2015.pdf (3MB)
(no description provided)PDF


Title:Stela: on-demand elasticity in distributed data stream processing systems
Author(s):Xu, Le
Advisor(s):Gupta, Indranil
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Distributed Systems
Stream Processing Systems
Abstract:Big data is characterized by volume and velocity [24], and recently several real- time stream processing systems have emerged to combat this challenge. These systems process streams of data in real time and computational results. However, current popular data stream processing systems lack the ability to scale out and scale in (i.e., increase or decrease the number of machines or VMs allocated to the application) efficiently and unintrusively when requested by the user on demand. In order to scale out/in, a critical problem that needs to be solved is to determine which operator(s) of the stream processing application need to be given more resources or taken resources away from, in order to maximize the application throughput. We do so by presenting a novel metric called "Expected Throughput Percentage" (ETP). ETP takes into account not only congested elements of the stream processing application but also their effect on downstream elements and on the overall application throughput. Next, we show how our new system, called Stela (STream processing ELAsticity), incorporates ETP in its scheduling strategy. Stela enables scale out and scale in operations on demand, and achieves the twin goals of optimizing post-scaling throughput and minimizing interference to throughput during the scaling out/in. We have integrated the implementation of Stela into Apache Storm [27], a popular data stream processing system. We conducted experiments on Stela using a set of micro benchmark topologies as well as two topologies from Yahoo! Inc. Our experiment results shows Stela achieves 5% to 120% higher post scale throughput comparing to default Storm scheduler performing scale out operations, and 40% to 500% of throughput improvement comparing to the default scheduler during scale in stage. This work is a joint project with Master student Boyang Peng [1].
Issue Date:2015-07-22
Rights Information:Copyright 2015 Le Xu
Date Available in IDEALS:2015-09-29
Date Deposited:August 201

This item appears in the following Collection(s)

Item Statistics