Files in this item



application/pdfLEE-THESIS-2017.pdf (2MB)Restricted to U of Illinois
(no description provided)PDF


Title:Visualization and differential privacy
Author(s):Lee, Hyun Bin
Advisor(s):Gunter, Carl A
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Differential privacy
Abstract:Privacy-preserving statistical databases are designed to provide information about a population while preventing end-users from learning about an individual. Meanwhile, scholars have shown that a sophisticated adversary can break such assumption against primitive privacy protections. Differential Privacy (DP) measures how likely an adversary learns about an individual from statistical database queries. Recent state-of-the-art Privacy-Enhancing Technologies (PETs) often implement noise injection based mechanisms in order to satisfy a strong DP protection level. While these privacy protection guidelines minimize risks of private information disclosure, many people have raised concerns on impracticality of the implementation. Based on statistical figures and quantitative experiment results, much literature formalized the utility-privacy tradeoff caused by the noise injection. In contrast, this work describes a qualitative analysis of the Laplacian noise mechanism, one of the most prevalently used DP mechanisms, with regards to the utility-privacy tradeoff on various types of visualization products. The dataset used for the analysis is a time series meter readings from smart grid electricity consumption of selected households from the Republic of Ireland. We examined how five types of visualization products, bar graphs, pie charts, heatmaps, linear plots and scatterplots, present information from statistical database queries. Visualization products showed seasonal, daily and weekly periodic consumption patterns such that power utilities can make a qualitative analysis of consumption profiles. We call these patterns as “visual cues.” After applying the Laplacian noise mechanism on these visualization products, we made qualitative observations on the privacy-preserved figures and looked for notable changes. The project provides graphic findings of a relationship among the composability of queries, the number of queries, and the scale of the Laplacian noise. We observed that visualization products which required less than ten queries from the dataset suffered minimal information loss. However, we spotted a high degradation of visual cues when we implemented the noise mechanism to heatmaps with up to 25,200 composable queries. These visualizations no longer conveyed most key information that used to be present on their unprotected counterparts. To best of our knowledge, no state-of-the-art existing pre/post-processing techniques significantly recovered most visual cues. Finally, we found that some visualizations belonged to neither of the first cases. privacy-preserving linear plots (based on 336 composable queries) and scatterplots (based on 3,639 pairs of parallel queries) inherited some visual cues after executing privacy-preserving procedures. We further discovered some pre/post-processing mechanisms that recovered visual cues.
Issue Date:2017-07-14
Rights Information:Copyright 2017 Hyun Bin Lee
Date Available in IDEALS:2018-03-02
Date Deposited:2017-08

This item appears in the following Collection(s)

Item Statistics