Withdraw
Loading…
Crafting safe human-centric agents with risk intelligence
Sun, Chenkai
Loading…
Permalink
https://hdl.handle.net/2142/129193
Description
- Title
- Crafting safe human-centric agents with risk intelligence
- Author(s)
- Sun, Chenkai
- Issue Date
- 2025-04-18
- Doctoral Committee Chair(s)
- Ji, Heng
- Zhai, ChengXiang
- Committee Member(s)
- Han, Jiawei
- Bendersky, Michael
- Small, Kevin
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- AI Safety
- Large Language Models
- AI Alignment
- Personalization
- Social Simulation
- Efficiency
- Abstract
- Safety has become a critical concern in digital communication environments, where harmful content can propagate rapidly and compromise societal well-being. As artificial intelligence increasingly serves as an automated content creation and distribution mechanism, the risks of adverse algorithmic impacts are amplified when these systems lack awareness of how their outputs influence audiences. This dissertation addresses this challenge by developing an integrated framework for communication risk management that enables systems to assess, anticipate, and mitigate potential negative consequences. The framework comprises four interconnected research contributions. First, we establish the foundations for risk assessment through a novel task formulation and dataset that captures how identical messages affect diverse user personas differently. This approach transcends traditional content moderation by evaluating information safety and appropriateness for different population groups, creating a measurement capability essential for risk-aware AI systems. Furthermore, we address the challenge of evaluating risk for users with minimal digital footprints, often referred to as "lurkers". By leveraging social graphs constructed through large language models, this work presents a solution for more accurately predicting opinions from such users, expanding the applicability of personalized agents in risk management. Another cornerstone of this dissertation is the development of efficient strategies for language model personalization in risk assessment. Through hierarchical and collaborative data refinement techniques, our Persona-DB approach achieves high-accuracy personalization with substantially reduced data retrieval requirements. This work bridges the gap between risk management and practical deployment by addressing the computational costs of personalization. Lastly, we extend risk assessment beyond immediate impacts to encompass temporal dynamics and cascading effects. By employing language models as social simulators, our framework projects how content might influence populations over time, enabling the anticipation of long-term consequences that traditional safety approaches neglect. This capability not only enhances risk assessment but also provides a mechanism for aligning content-generating AI with broader safety considerations.
- Graduation Semester
- 2025-05
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/129193
- Copyright and License Information
- Copyright 2025 Chenkai Sun
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…