Files in this item



application/pdfECE499-Sp2018-gupta.pdf (932kB)
(no description provided)PDF


Title:Benchmarking modern distributed stream processing systems with customizable synthetic workloads
Author(s):Gupta, Srujun Thanmay
Contributor(s):Gupta, Indranil
Subject(s):stream processing
Abstract:Real-time analysis of continuous data streams using distributed systems is an emerging class of data analytics problems that require systems with high throughput and low latency to efficiently analyze high velocity data. As stream processing applications become increasingly popular, many frameworks used to build clusters to process this data have emerged in recent years. These include frameworks like Samza, Storm, Heron, Spark Streaming, Flink, and Apex. For system administrators and developers, there is great value in understanding the capabilities and performance of their stream processing workloads, given the various frameworks running on their cluster configuration. In this thesis, we present Finch, a new benchmarking tool that can be used to create synthetic stream processing workloads. Finch generates metrics that system administrators and developers can use to benchmark their stream processing applications. To achieve this, Finch provides a flexible and easy way to define arbitrary workloads using tunable operators. It then translates these workloads into applications that are run by the target system. To describe Finch's design, we investigate what parameters affect workload performance, and present studies on fault tolerance and system scalability. We then use Finch to understand and compare two popular stream processing frameworks, Samza and Heron.
Issue Date:2018-05
Date Available in IDEALS:2018-05-23

This item appears in the following Collection(s)

Item Statistics