Files in this item



application/pdfNGUYEN-DISSERTATION-2018.pdf (5MB)
(no description provided)PDF


Title:Dossier: Distributed operating system and infrastructure for scientific data management
Author(s):Nguyen, Phuong Viet
Director of Research:Nahrstedt, Klara
Doctoral Committee Chair(s):Nahrstedt, Klara
Doctoral Committee Member(s):Campbell, Roy H.; Gupta, Indranil; Turaga, Deepak
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
microservice architecture
adaptive control
edge-cloud architecture
scientific data management
Abstract:As scientific advancement and discovery have become increasingly data-driven and interdisciplinary, there are urging needs for advanced cyberinfrastructure to support managing and process- ing scientific data generated from day-to-day research. However, the development of data-driven cyberinfrastructure for scientific research areas has often lagged behind the development of such tools in other engineering and IT-related fields. Such the development gap is due to various diversity challenges of scientific data management and processing. First, these are the challenges in terms of the diversity of scientific data and data processing tasks, as the cyberinfrastructure should be able to support managing and processing heterogeneous types of scientific data that have been captured from scientific instruments. Second, as the cyberinfrastructure must help to shorten time from digital capture of data to interpretation and insights, it is challenging for the infrastructure to deal with the diversity of users and scientific workload. Third, it is the diversity of scientific instruments. Since there is still a significant number of scientific instruments that run their scientific software tools on old operating systems (e.g., Windows XP, Windows NT, Windows 2000), the cyberinfrastructure must help to bridge the performance and security gap between old scientific instruments and its advanced cloud-based infrastructure. In this thesis, we aim to address the above diversity challenges by taking a holistic approach in designing a distributed operating system and infrastructure for scientific data management, named DOSSIER. At the core of DOSSIER is an adaptive control microservice infrastructure that is de- signed to tackle the aforementioned challenges of data cyberinfrastructure for distributed scientific data management. Particularly, to handle heterogeneous scientific data processing and analysis, we start with redesigning the execution environment for scientific workflows, which traditionally follows a monolithic approach, using a novel microservice architecture and latest virtualization technology (i.e., container technology). The microservice design enables dynamic composition of workflows, and thus, is efficient in dealing with heterogeneous workflows. The new microservice architecture also allows us to express system resources in a more simple way, and thus, enables the design of a new adaptive resource management mechanism to handle large-scale and dynamic scientific workloads. We are the first to apply feedback control theory to design a self-adaptation mechanism for scientific workflow management system to help shorten the time from data acquisition to insights. To address the security and performance gap issues when connecting old scientific instruments to cloud-based cyberinfrastructure, we design an edge-cloud architecture that puts cloudlet servers directly connected to the scientific instruments and act as the security shield for the aging instruments. Cloudlets will also coordinate with cloud-based backend system to tackle the performance issue by scheduling data transfer and offloading processing tasks to cloudlets to avoid traffic congestion and guarantee performance of data processing jobs across edge-cloud architecture. By designing, developing, and testing DOSSIER in the real scientific environments, we demonstrate that an edge-cloud microservice architecture with learning-based adaptive control resource management is needed for timely distributed scientific data management.
Issue Date:2018-07-12
Rights Information:Copyright 2018 Phuong Nguyen
Date Available in IDEALS:2018-09-27
Date Deposited:2018-08

This item appears in the following Collection(s)

Item Statistics