Files in this item

FilesDescriptionFormat

application/pdf

application/pdfPATIL-THESIS-2020.pdf (12MB)
(no description provided)PDF

Description

Title:Privacy implications of information leakage from IP addresses - a web fingerprinting approach
Author(s):Patil, Simran Pramod
Advisor(s):Borisov, Nikita
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:M.S.
Genre:Thesis
Subject(s):Web Fingerprinting, Encrypted DNS, Web Privacy
Abstract:The Internet was not designed with security in mind. A number of recent protocols such as Encrypted DNS, HTTPS, etc. target encrypting critical parts of the web architecture, which were previously sent in the clear. IP addresses still remain visible to on-path observers and can be utilized for censorship, surveillance and sabotaging user’s privacy on the web. We perform a measurement study on datasets representative of the state of the Internet fetched via HTTP Archive or those collected with configurations like Adblock enabled vs. disabled over extended periods of time by crawling Alexa’s top websites to gauge the amount of information leaked by IP addresses. We build a page load fingerprint for each of the websites crawled and filter the websites that have uniquely identifying IP addresses mapped to them. We build a neural network to study how accurately the classifier works in fingerprinting websites based on IP addresses and their respective Autonomous System Numbers (ASNs). Approximately 80% of the IP addresses have an anonymity set comprising of a unique website and can successfully identify it. The classifier performs with an accuracy of about 60% on the remaining data. We observe that the classifier confuses websites belonging to common hosting infrastructures. Manual clustering efforts on the data based on these trends can increase the classification accuracy. We find areas of improvement for the current measurement study and provide suggestions to Content Delivery Networks (CDNs) and other agents fundamental to the Internet infrastructure to increase user privacy.
Issue Date:2020-05-11
Type:Thesis
URI:http://hdl.handle.net/2142/108011
Rights Information:Copyright 2020 Simran Patil
Date Available in IDEALS:2020-08-26
Date Deposited:2020-05


This item appears in the following Collection(s)

Item Statistics