Files in this item



application/pdfJALAPARTI-DISSERTATION-2015.pdf (3MB)
(no description provided)PDF


Title:Improving the end-to-end latency of datacenter applications using coordination across application components
Author(s):Jalaparti, Virajith
Director of Research:Caesar, Matthew
Doctoral Committee Chair(s):Caesar, Matthew
Doctoral Committee Member(s):Gupta, Indranil; Nahrstedt, Klara; Kandula, Srikanth
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Datacenter applications
Resource allocation
Cluster schedulers
Cross-layer optimization
Joint data and compute placement
Interactive services
Tail latency
Resource Malleability
Abstract:To handle millions of user requests every second and process hundreds of terabytes of data each day, many organizations have turned to large datacenter-scale computing systems. The applications running in these datacenters consist of a multitude of dependent logical components or stages which perform specific functionality. These stages are connected to form a directed acyclic graph (DAG), with edges representing input-output dependencies. Each stage can run over tens to thousands of machines, and involves multiple cluster sub-systems such as storage, network and compute. The scale and complexity of these applications can lead to significant delays in their end-to-end latency. However, the organizations running these applications have strict requirements on this latency as it directly affects their revenue and operational costs. Addressing this problem, the goal of this dissertation is to develop scheduling and resource allocation techniques to optimize for the end-to-end latency of datacenter applications. The key idea behind these techniques is to utilize coordination between different application components, allowing us to efficiently allocate cluster resources. In particular, we develop planning algorithms that coordinate the storage and compute sub-systems in datacenters to determine how many resources should be allocated to each stage in an application along with where in the cluster should they be allocated, to meet application requirements (e.g., completion time goals, minimize average completion time etc.). To further speed up applications at runtime, we develop a few latency reduction techniques: reissuing laggards elsewhere in the cluster, returning partial results and speeding up laggards by giving them extra resources. We perform a global optimization to coordinate across all the stages in an application DAG and determine which of these techniques works best for each stage, while ensuring that the cost incurred by these techniques is within a given end-to-end budget. We use application characteristics to predict and determine how resources should be allocated to different application components to meet the end-to-end latency requirements. We evaluate our techniques on two different kinds of datacenter applications: (a) web services, and (b) data analytics. With large-scale simulations and an implementation in Apache Yarn (Hadoop 2.0), we use workloads derived from production traces to show that our techniques can achieve more than 50% reduction in the 99th percentile latency of web services and up to 56% reduction in the median latency of data analytics jobs.
Issue Date:2015-07-14
Rights Information:Copyright 2015 Virajith Jalaparti
Date Available in IDEALS:2015-09-29
Date Deposited:August 201

This item appears in the following Collection(s)

Item Statistics