Malice, inequality, instability, or ignorance? Disentangling the mechanisms of LLM unfairness
Yang, Ke
Loading…
Permalink
https://hdl.handle.net/2142/132559
Description
Title
Malice, inequality, instability, or ignorance? Disentangling the mechanisms of LLM unfairness
Author(s)
Yang, Ke
Issue Date
2025-12-03
Director of Research (if dissertation) or Advisor (if thesis)
Zhai, ChengXiang
Department of Study
Siebel School Comp & Data Sci
Discipline
Computer Science
Degree Granting Institution
University of Illinois Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
large language model
unfairness measurement
Language
eng
Abstract
Ensuring fairness in large language models (LLMs) is critical as these models are increasingly deployed in sensitive domains. Traditional fairness metrics typically report a single scalar score, which conflates distinct sources of model failure and obscures underlying biases. In this work, we propose a Hierarchical Bias-Variance Decomposition framework—termed BDSU—that decomposes total discrimination risk into four interpretable components: Bias (systematic global error), Disparity (group-level variance), Sensitivity (context-level variance), and Uncertainty (stochastic or token-level variance). By applying the law of total variance recursively, BDSU provides a principled method to quantify and separate these failure modes, aligning each with ethical and reliability priorities. We further introduce a conditional micro-diagnosis to evaluate fairness at the group level, enabling fine-grained auditing and targeted interventions. Our theoretical framework lays the foundation for more transparent, actionable, and robust evaluation of LLM fairness, highlighting the distinct mechanisms by which models may perpetuate bias or exhibit instability.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.