Towards similarity learning in security applications

Hao, Qingying

Towards similarity learning in security applications

Hao, Qingying

This item's files can only be accessed by the System Administrators group.

Permalink

https://hdl.handle.net/2142/130183

Description

Title

Towards similarity learning in security applications

Author(s)

Hao, Qingying

Issue Date

2025-07-15

Director of Research (if dissertation) or Advisor (if thesis)

Wang, Gang

Doctoral Committee Chair(s)

Wang, Gang

Committee Member(s)

Gunter, Carl
Li, Bo
Chandrasekaran, Varun
Conti, Mauro

Department of Study

Siebel School Comp & Data Sci

Discipline

Computer Science

Degree Granting Institution

University of Illinois Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Security
Machine Learning
Similarity Learning

Language

eng

Abstract

In today’s world, large amounts of data remain unlabeled, posing a major challenge, especially in security applications, where acquiring high-quality labels is costly and difficult. Without accurate labels, it is hard to train reliable machine learning (ML) models, which limits their effectiveness in real-world scenarios. Similarity learning provides a promising direction by capturing relationships within the data without requiring explicit labels. Instead, it learns from reference pairs by measuring similarity through distance. While simple distance metrics can be used , similarity learning is often combined with deep learning to learn robust feature representations for comparison using predefined or learned similarity measures. How reliable is similarity learning in real-world security applications, particularly when exposed to adversarial threats? Under what conditions can it enhance model generalization and detection performance? This dissertation evaluates the robustness of similarity-learning applications under a realistic threat model by applying adversarial attacks end-to-end, and shows how similarity learning can improve out-of-distribution (OOD) generalization in graph-structured data. Specifically, Chapter 3 presents adversarial attacks targeting perceptual hashing-based reverse image search engines, which use Hamming distance as the similarity metric. By developing advanced attacks and evaluating them end-to-end on real-world systems, our framework successfully subverts several major reverse image search engines. In Chapter 4, we present attacks on vision-based phishing detectors trained using similarity learning. Our framework generates adversarial logos that preserve original brand semantics while bypassing state-of-the-art visual phishing website detectors. Chapter 5 explores how similarity learning, specifically graph contrastive learning (GCL), can complement supervised learning to improve out-of-distribution generalization in graph neural networks (GNNs) under natural distribution shifts. In summary, these studies show that similarity learning-based applications are vulnerable to adversarial attacks, highlighting the need for stronger defenses under realistic end-to-end threat models. At the same time, similarity learning can complement supervised methods by providing diverse feature representations and decision signals, making it valuable for improving out-of-distribution detection under natural distribution shifts.

Graduation Semester

2025-08

Type of Resource

Text

Handle URL

https://hdl.handle.net/2142/130183

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Computer Science

Dissertations and Theses from the Siebel School of Computer Science

Towards similarity learning in security applications

Hao, Qingying

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

Log In