Files in this item



application/pdfCristina_Abad.pdf (3MB)
(no description provided)PDF


Title:Big data storage workload characterization, modeling and synthetic generation
Author(s):Abad, Cristina L.
Director of Research:Campbell, Roy H.
Doctoral Committee Chair(s):Campbell, Roy H.
Doctoral Committee Member(s):Nahrstedt, Klara; Gupta, Indranil; Lu, Yi; Cherkasova, Ludmila
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Big Data
Hadoop Distributed File System (HDFS)
Abstract:A huge increase in data storage and processing requirements has lead to Big Data, for which next generation storage systems are being designed and implemented. As Big Data stresses the storage layer in new ways, a better understanding of these workloads and the availability of flexible workload generators are increasingly important to facilitate the proper design and performance tuning of storage subsystems like data replication, metadata management, and caching. Our hypothesis is that the autonomic modeling of Big Data storage system workloads through a combination of measurement, and statistical and machine learning techniques is feasible, novel, and useful. We consider the case of one common type of Big Data storage cluster: A cluster dedicated to supporting a mix of MapReduce jobs. We analyze 6-month traces from two large clusters at Yahoo and identify interesting properties of the workloads. We present a novel model for capturing popularity and short-term temporal correlations in object request streams, and show how unsupervised statistical clustering can be used to enable autonomic type-aware workload generation that is suitable for emerging workloads. We extend this model to include other relevant properties of storage systems (file creation and deletion, pre-existing namespaces and hierarchical namespaces) and use the extended model to implement MimesisBench, a realistic namespace metadata benchmark for next-generation storage systems. Finally, we demonstrate the usefulness of MimesisBench through a study of the scalability and performance of the Hadoop Distributed File System name node.
Issue Date:2014-05-30
Rights Information:Copyright 2014 Cristina Abad
Date Available in IDEALS:2014-05-30
Date Deposited:2014-05

This item appears in the following Collection(s)

Item Statistics