Files in this item



application/pdfthesis.pdf (954kB)
(no description provided)PDF


Title:Exploiting System Diversity in Peer-to-Peer Publish-Subscribe Systems
Author(s):Patel, Jay A.
Doctoral Committee Chair(s):Gupta, Indranil
Doctoral Committee Member(s):Agha, Gul A.; Kravets, Robin H.; Van Steen, Maarten
Department / Program:Computer Science
Discipline:Computer Science
University of Illinois at Urbana-Champaign
Publish-Subscribe systems
System Diversity
Abstract:This thesis presents new techniques that exploit system diversity within a particular class of peer-to-peer publish-subscribe systems. We show that by directly addressing interest and network diversity as a first class design principle, the scale and performance of such systems can be improved. This thesis makes four major contributions. Firstly, we present Confluence, a system that significantly reduces the time to transfer large files from multiple publishers (sources) to a single subscriber (sink node) as compared to the direct transfer strategy. Confluence lets scientists rapidly collect logs from either multiple PlanetLab hosts or multi-site cloud computing infrastructures. It uses a novel source-2-source (s2s) overlay to speed up the transfer of file blocks towards the sink. Intuitively, the s2s overlay facilitates a source node (with a congested path to the sink) to utilize other source nodes as intermediaries for routing file blocks to the sink. Concretely, our approach first poses the problem as a variant of flow optimization among the source nodes. This captures the spatial diversity in bandwidth. We provide a theoretically optimal solution to this problem. Next, we augment this static solution with on-the-fly recomputation. This helps us exploit temporal diversity in bandwidth. Using Confluence, with 25 source nodes in a PlanetLab-like environment, 80% of nodes see a reduction in transfer time of at least 20% over the direct transfer strategy. Our second system, Rappel, is a peer-to-peer delivery mechanism for RSS feeds. Rappel is the first subject-based publish-subscribe system to be noiseless, be truly peer-to-peer, and perform soft real-time dissemination of messages. Noiselessness implies that a subscriber never receives messages for feeds that it is not subscribed to, and is important because it improves fairness: the load imposed by the system on each participating node is proportional to the node's demands from the system. Rappel exploits interest and network diversity via the use of periodic utility computations, wherein the utility of a peer (``friend'') is derived using Bloom filters and network coordinates. Bloom filters succinctly capture the subscription interest of a node, whereas network coordinates help capture the network location of a node. Via push-pull gossip, a node seeks to find a set of friends that provide good subscription coverage while being in close network proximity. By having peers in close network proximity, messages are disseminated with very low latency. The third contribution of this thesis is the Realistic Application-level Network Simulation (RANS) framework. This is motivated by two observations. Firstly, system deployment is a labor-intensive exercise, and thus, limited in scale. For instance, PlanetLab, a large wide-area experimental network testbed, usually only has about 400 accessible nodes at any given moment. Secondly, due to the presence of extrinsic interferences, experiments are not replayable. Simulations provide an acceptable solution to these problems, however, they often fail to mimic realistic network conditions. In contrast to these two approaches, the RANS framework provides a modular programming interface that can be leveraged to produce both realistic simulation results and a ready-to-deploy sockets binary. Our main contributions are in (1) developing a realistic and reusable selective granularity discrete-event simulator for PlanetLab, and (2) showing that the results generated by the RANS simulation framework closely match the results obtained by performing the same experiments on a PlanetLab deployment. Fourthly, the systems described in this thesis have been comprehensively evaluated via both PlanetLab deployment and simulation. Our deployments used up to 400 PlanetLab servers world-wide. Our largest simulations model 10,000 nodes. Our experimental methodology is constructed using an extensive amount of real-world traces. For instance, to evaluate Rappel using realistic user subscriptions, we gathered the subscription profiles of 1.8 million LiveJournal users over six months. The evaluation presented in this thesis also makes use of the following previously collected traces: Internet topology, end-to-end latency fluctuations between PlanetLab nodes, bandwidth availability between PlanetLab nodes, and end user churn observed in peer-to-peer file sharing applications.
Issue Date:2009-05
Citation Info:Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate College of the University of Illinois at Urbana-Champaign, 2009
Genre:Dissertation / Thesis
Publication Status:unpublished
Peer Reviewed:is peer reviewed
Date Available in IDEALS:2009-04-30

This item appears in the following Collection(s)

Item Statistics