Files in this item



(no description provided)PDF


Title:Building interactive distributed processing applications at a global scale
Author(s):Abdollahian Noghabi, Shadi
Director of Research:Campbell, Roy H.; Gupta, Indranil
Doctoral Committee Chair(s):Campbell, Roy H.; Gupta, Indranil
Doctoral Committee Member(s):Nahrstedt, Klara; Bahl, Victor
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Distributed Systems
Cloud Computing
Low Latency
Large Scale
Distributed Storage
Stream Processing
Edge Computing
Container Networking
Abstract:Along with the continuous engagement with technology, many latency-sensitive interactive applications have emerged, e.g., global content sharing in social networks, adaptive lights/temperatures in smart buildings, and online multi-user games. These applications typically process a massive amount of data at a global scale. In this cases, distributing storage and processing is key to handling the large scale. Distribution necessitates handling two main aspects: a) the placement of data/processing and b) the data motion across the distributed locations. However, handling the distribution while meeting latency guarantees at large scale comes with many challenges around hiding heterogeneity and diversity of devices and workload, handling dynamism in the environment, providing continuous availability despite failures, and supporting persistent large state. In this thesis, we show how latency-driven designs for placement and data-motion can be used to build production infrastructures for interactive applications at a global scale, while also being able to address myriad challenges on heterogeneity, dynamism, state, and availability. We demonstrate a latency-driven approach is general and applicable at all layers of the stack: from storage, to processing, down to networking. We designed and built four distinct systems across the spectrum. We have developed Ambry (collaboration with LinkedIn), a geo-distributed storage system for interactive data sharing across the globe. Ambry is LinkedIn's mainstream production system for all its media content running across 4 datacenters and over 500 million users. Ambry minimizes user perceived latency via smart data placement and propagation. Second, we have built two processing systems, a traditional model, Samza, and the avant-garde model, Steel. Samza (collaboration with LinkedIn) is a production stream processing framework used at 15 companies (including LinkedIn, Uber, Netflix, and TripAdvisor), powering >200 pipelines at LinkedIn alone. Samza minimizes the impact of data motion on the end-to-end latency, thus, enabling large persistent state (100s of TB) along with processing. Steel (collaboration with Microsoft) extends processing to the emerging edge. Integrated with Azure, Steel dynamically optimizes placement and data-motion across the entire edge-cloud environment. Finally, we have designed FreeFlow, a high performance networking mechanisms for containers. Using the container placement, FreeFlow opportunistically bypasses networking layers, minimizing data motion and reducing latency (up to 3 orders of magnitude).
Issue Date:2018-11-13
Rights Information:Copyright 2018 Shadi Abdollahian Noghabi
Date Available in IDEALS:2019-02-06
Date Deposited:2018-12

This item appears in the following Collection(s)

Item Statistics