Files in this item



application/pdfBolin_Ding.pdf (1MB)
(no description provided)PDF


Title:Privacy-preserving data publishing and analytics using data cubes
Author(s):Ding, Bolin
Director of Research:Han, Jiawei
Doctoral Committee Chair(s):Han, Jiawei
Doctoral Committee Member(s):Winslett, Marianne; Zhai, ChengXiang; Machanavajjhala, Ashwin
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):online analytical processing (OLAP)
data cube
differential privacy
private data analysis
Abstract:Data cubes play an essential role in data analysis and decision support. In a data cube, data from a fact table is aggregated on subsets of the table's dimensions, forming a collection of smaller tables called cuboids. When the fact table includes sensitive data such as salary or diagnosis, publishing even a subset of its cuboids may compromise individuals' privacy. In this thesis, we address several problems about privacy-preserving publishing of data cubes using differential privacy or its extensions, which provide privacy guarantees for individuals by adding noise to query answers. The first problem is about how to improve the data quality in privacy-preserving data cubes. Our noise-control frameworks choose noise source in a data cube, i.e., an initial subset of cuboids to compute directly from the fact table with certain amount of noise to be injected to each of them, and then compute the remaining cuboids from them. We show that it is NP-hard to choose proper noise source for certain noise-control objetives, but provide efficient approximation algorithms. The second problem is about how to enforce consistency in the published cuboids. We proposed several approaches with provable guarantee on the noise bound and one of them can even improve the utility of differentially private cuboids (reducing error). The third problem is about how to calibrate noise in data cubes subject to certain exact background knowledge while we are trying to improve the data quality. The notation of generic differential privacy is applied, and we generalize its properties to plug it into our noise-control frameworks for handling background knowledge. Techniques proposed in this thesis provide advanced principles and major parts of a complete solution towards privacy-preserving publishing of data cubes.
Issue Date:2013-02-03
Rights Information:Copyright 2012 Bolin Ding
Date Available in IDEALS:2013-02-03
Date Deposited:2012-12

This item appears in the following Collection(s)

Item Statistics