Generative modeling of interactive and reactive digital humans
Xu, Xiyan
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/129510
Description
Title
Generative modeling of interactive and reactive digital humans
Author(s)
Xu, Xiyan
Issue Date
2025-04-22
Director of Research (if dissertation) or Advisor (if thesis)
Gui, Liangyan
Wang, Yuxiong
Department of Study
Siebel School Comp & Data Sci
Discipline
Computer Science
Degree Granting Institution
University of Illinois Urbana-Champaign
Degree Name
M.S.
Degree Level
Thesis
Keyword(s)
Human Motion Generation
Abstract
Modeling interactive and reactive human behaviors is essential for building intelligent virtual agents, enhancing immersive experiences in VR/AR, and advancing robot learning. Among the diverse forms of human activity, three key types of interactions—human-human, human-object, and hand-hand—play fundamental roles in shaping social communication, environmental engagement, and fine-grained physical manipulation. Accurately modeling these interactions is crucial for constructing holistic digital humans capable of situational awareness, social intelligence, and physical competence.
Although recent advances in generative modeling, particularly diffusion-based approaches, have significantly improved the realism and diversity of synthesized motions, generating semantically meaningful, physically plausible, and generalizable interactive behaviors remains a formidable challenge across all three interaction types. In the domain of human-human interactions, existing methods often struggle to produce reactions that are simultaneously physically coherent and semantically aligned with contextual cues. To address these limitations, we propose MoReact, a two-stage diffusion framework addresses text-conditioned reaction generation task, guided by an interaction-aware loss designed to enhance both physical plausibility and semantic fidelity.
For human-object interactions, prior works frequently rely on narrow assumptions, such as restricting interactions to hand contacts or focusing on specific object categories, which restricts generalization. We introduce AuxMoDiff, a unified and flexible diffusion model that incorporates auxiliary spatial cues to better capture dynamic human-object relationships, improving contact accuracy and adaptability to unseen objects and tasks.
For hand-hand interactions, research progress has been hampered by the lack of highquality datasets with semantic annotations, making meaningful bimanual motion generation difficult. To fill this gap, we present TextHand, the first large-scale dataset of close two-hand interactions paired with rich natural language descriptions. Building upon this resource, we develop TextHHI, a text-driven diffusion model capable of synthesizing realistic, expressive, and semantically aligned bimanual interactions from textual prompts.
Together, these contributions advance the generative modeling of interactive and reactive digital humans across multiple scenarios. By addressing the challenges in human-human, human-object, and hand-hand interaction modeling, this thesis takes an important step toward building intelligent virtual agents that can seamlessly coordinate social behaviors, object manipulations, and self-movements within complex, dynamic environments.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.