Files in this item

FilesDescriptionFormat

application/pdf

application/pdfVENKATARAMAN-THESIS-2017.pdf (1MB)Restricted to U of Illinois
(no description provided)PDF

Description

Title:DataSpread: scaling spreadsheets using relational databases
Author(s):Venkataraman, Vipul
Advisor(s):Parameswaran, Aditya
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:M.S.
Genre:Thesis
Subject(s):spreadsheets
interactivity
data models
Abstract:Spreadsheet software is the tool of choice for ad-hoc tabular data management, manipulation, querying, and visualization with adoption by billions of users. However, spreadsheets are not scalable, unlike database systems. We develop DataSpread, a system that holistically unifies databases and spreadsheets with a goal to work with massive spreadsheets: DataSpread retains all of the advantages of spreadsheets, including ease of use, ad-hoc analysis and visualization capabilities, and a schema-free nature, while also adding the scalability and collaboration abilities of traditional relational databases. We design DataSpread with a spreadsheet front-end and a regular relational database back-end. To integrate spreadsheets and databases, in this thesis, we develop a storage and indexing engine for spreadsheet data. We first formalize and study the problem of representing and manipulating spreadsheet data within a relational database. We demonstrate that identifying the optimal representation is NP-Hard via a reduction from partitioning of rectangles; however, under certain reasonable assumptions, can be solved in PTIME. We develop a collection of mechanisms for representing spreadsheet data, and evaluate these representations on a workload of typical data manipulation operations. We augment our mechanisms with novel positionally-aware indexing structures that further improve performance. DataSpread can scale to billions of cells, returning results for common operations within seconds. Lastly, to motivate our research questions, we perform an extensive survey of spreadsheet use for ad-hoc tabular data management.
Issue Date:2017-04-12
Type:Thesis
URI:http://hdl.handle.net/2142/97692
Rights Information:Copyright 2017 Vipul Venkataraman
Date Available in IDEALS:2017-08-10
Date Deposited:2017-05


This item appears in the following Collection(s)

Item Statistics