Withdraw
Loading…
Examining large language models for safety and robustness through the lens of social science
Jeoung, Sullam
Loading…
Permalink
https://hdl.handle.net/2142/129175
Description
- Title
- Examining large language models for safety and robustness through the lens of social science
- Author(s)
- Jeoung, Sullam
- Issue Date
- 2025-03-07
- Director of Research (if dissertation) or Advisor (if thesis)
- Diesner, Jana
- Doctoral Committee Chair(s)
- Diesner, Jana
- Committee Member(s)
- Kilicoglu, Halil
- Bosh, Nigel
- Wang, Haohan
- Department of Study
- Information Sciences
- Discipline
- Information Sciences
- Degree Granting Institution
- University of Illinois Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Large Language Model
- Responsible AI
- Abstract
- Large language models have demonstrated remarkable capabilities, often achieving human-like performance levels and significantly impacting our daily lives. However, these models can perpetuate and amplify harmful stereotypes and biases associated with socio-demographic representations, potentially generating discriminatory content that adversely affects individuals and communities. Given these risks and their broader societal implications, ensuring the safety and robustness of these models through the identification and mitigation of harmful stereotypes has become imperative. This dissertation presents comprehensive methodologies to address these challenges by integrating insights from social science, psychology, and cognitive studies with methods from natural language processing. First, we present a framework to assess human-like stereotypical patterns in large language models (LLMs), drawing upon established psychological theories of how individuals develop stereotypes toward various social groups. This theoretically-grounded approach provides construct validity in defining and measuring stereotypes. The framework incorporates three key dimensions: warmth-competence analysis, keyword- reasoning patterns, and emotional-behavioral responses. Through clustering analysis, keyword extraction, and reasoning pattern evaluation of LLM responses, we examine how these models align with or deviate from documented human behavioral patterns. Our findings reveal that LLMs demonstrate nuanced perceptions of social groups, consistent with psychological research highlighting the multifaceted nature of stereotypes. Notably, the models’ reasoning patterns, particularly regarding groups’ economic status, demonstrate a nuanced awareness of societal disparities. Second, we propose methods to examine causal sensitivity of language models on socio-demographic attributes. This is based on a controlled experimental framework that uses name frequency analysis from U.S. Census data and systematic evaluation of model predictions through causal graphs. Our findings show that less frequent first names lead to divergent model predictions, highlighting the need for careful demographic consideration in dataset design to ensure fair and consistent model performance across different name representations. Third, we investigate how LLMs exhibit and inflate political stereotypes through the lens of cognitive biases and representative heuristics. We analyze LLMs’ responses using two key theoretical frameworks: ’kernel of truth’ (whether stereotypes reflect empirical realities) and ’representative heuristics’ (whether models overemphasize representative attributes of target groups), comparing model outputs with actual human responses across various political topics. Our findings show that while LLMs can accurately mimic certain political positions, they tend to exaggerate these positions compared to empirical human responses, suggesting a vulnerability to stereotypical thinking similar to human cognitive biases. This implies the need for careful consideration of cognitive bias frameworks in developing and deploying language models, particularly in politically sensitive contexts, and demonstrates the potential effectiveness of prompt-based mitigation strategies in reducing stereotypical responses. Overall, this dissertation enhances the understanding and safety of LLMs by proposing a framework to assess human-like stereotypes, methods to evaluate causal sensitivity based on socio-demographic attributes, and an analysis of political stereotypes through cognitive bias frameworks. By integrating insights from psychology and social sciences with computational methods this research makes a contribution to the ongoing discourse on ethical AI deployment, highlighting the necessity of understanding and addressing biases in language models to promote fairness and reduce discriminatory outcomes.
- Graduation Semester
- 2025-05
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/129175
- Copyright and License Information
- Copyright 2025 Sullam Jeoung
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…