Files in this item

FilesDescriptionFormat

application/pdf

application/pdfGHORBANIKHALEDI-DISSERTATION-2016.pdf (2MB)Restricted Access
(no description provided)PDF

Description

Title:Squeezing the most benefit from network parallelism in datacenters
Author(s):Ghorbani Khaledi, Soudeh
Director of Research:Godfrey, Brighten
Doctoral Committee Chair(s):Godfrey, Brighten
Doctoral Committee Member(s):Rexford, Jennifer; Vaidya, Nitin; Gupta, Indranil
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:Ph.D.
Genre:Dissertation
Subject(s):Parallelism
Causality
Efficiency
Load balancing
Traffic engineering
Stability
Datacenters
Replication
One-big-switch
Correctness
Transparency
Abstract:One big non-blocking switch is one of the most powerful and pervasive abstractions in datacenter networking. As Moore's law begins to wane, using parallelism to scale out processing units, vs. scale them up, is becoming exceedingly popular. The one-big-switch abstraction, for example, is typically implemented via leveraging massive degrees of parallelism behind the scene. In particular, in today's datacenters that exhibit a high degree of multi-pathing, each logical path between a communicating pair in the one-big-switch abstraction is mapped to a set of paths that can carry traffic in parallel. Similarly, each one-big-switch abstraction function, such as the firewall functionality, is mapped to a set of distributed hardware and software switches. Efficiently deploying this pool of networking connectivity and preserving the functional correctness of network functions, in spite of the parallelism, are challenging. Efficiently balancing the load among multiple paths is challenging because microbursts, responsible for the majority of packet loss in datacenters today, usually last for only a few microseconds. Even the fastest traffic engineering schemes today have control loops that are several orders of magnitude slower (a few milliseconds to a few seconds), and are therefore ineffective in controlling microbursts. Correctly implementing network functions in the face of parallelism is hard because the distributed set of elements that in parallel implement a one-big-switch abstraction can inevitably have inconsistent states that may cause them to behave differently than one physical switch. The first part of this thesis presents DRILL, a datacenter fabric for Clos networks which performs micro load balancing to distribute load as evenly as possible on microsecond timescales. To achieve this, DRILL employs packet-level decisions at each switch based on local queue occupancies and randomized algorithms to distribute load. Despite making per-packet forwarding decisions, by enforcing a tight control on queue occupancies, DRILL manages to keep the degree of packet reordering low. DRILL adapts to topological asymmetry (e.g. failures) in Clos networks by decomposing the network into symmetric components. Using a detailed switch hardware model, we simulate DRILL and show it outperforms recent edge-based load balancers particularly in the tail latency under heavy load, e.g., under 80% load, it reduces the 99.99th percentile of flow completion times of Presto and CONGA by 32% and 35%, respectively. Finally, we analyze DRILL's stability and throughput-efficiency. In the second part, we focus on the correctness of one-big-switch abstraction's implementation. We first show that naively using parallelism to scale networking elements can cause incorrect behavior. For example, we show that an IDS system which operates correctly as a single network element can erroneously and permanently block hosts when it is replicated. We then provide a system, COCONUT, for seamless scale-out of network forwarding elements; that is, an SDN application programmer can program to what functionally appears to be a single forwarding element, but which may be replicated behind the scenes. To do this, we identify the key property for seamless scale out, weak causality, and guarantee it through a practical and scalable implementation of vector clocks in the data plane. We build a prototype of COCONUT and experimentally demonstrate its correct behavior. We also show that its abstraction enables a more efficient implementation of seamless scale-out compared to a naive baseline. Finally, reasoning about network behavior requires a new model that enables us to distinguish between observable and unobservable events. So in the last part, we present the Input/Output Automaton (IOA) model and formalize networks' behaviors. Using this framework, we prove that COCONUT enables seamless scale out of networking elements, i.e., the user-perceived behavior of any COCONUT element implemented with a distributed set of concurrent replicas is provably indistinguishable from its singleton implementation.
Issue Date:2016-11-28
Type:Thesis
URI:http://hdl.handle.net/2142/95589
Rights Information:Copyright © 2016 by Soudeh Ghorbani All rights reserved.
Date Available in IDEALS:2017-03-01
Date Deposited:2016-12


This item appears in the following Collection(s)

Item Statistics