Withdraw
Loading…
Advancing semantic modeling: addressing coordination, interpretability, and data scarcity in domain representation
Akash, Pritom Saha
Loading…
Permalink
https://hdl.handle.net/2142/132501
Description
- Title
- Advancing semantic modeling: addressing coordination, interpretability, and data scarcity in domain representation
- Author(s)
- Akash, Pritom Saha
- Issue Date
- 2025-11-19
- Director of Research (if dissertation) or Advisor (if thesis)
- Chang, Kevin Chen-Chuan
- Doctoral Committee Chair(s)
- Chang, Kevin Chen-Chuan
- Committee Member(s)
- Zhai, ChengXiang
- He, Jingrui
- Popa, Lucian
- Department of Study
- Siebel School Comp & Data Sci
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Semantic Modeling
- Topic Modeling
- Domain Representation
- Keyword Selection
- Domain Adaptation
- Variational Auto Encoder
- Abstract
- The rapid growth of online text—from scientific articles and product catalogs to news and social media—offers an unprecedented opportunity to uncover structure and meaning from unorganized information. Traditional topic models provide a powerful framework for discovering latent themes but often fall short in practice: their outputs are generic and difficult to interpret, alignment across corpora is not guaranteed, and they struggle with both short documents and domains with limited data. These challenges limit their usefulness for applications such as search, recommendation, trend analysis, and domain comparison, where interpretability and adaptability are essential. This thesis develops four complementary solutions that address these limitations. Coordinated Topic Modeling (CTM) introduces a framework for aligning corpus-specific topics with interpretable reference topics, enabling both interpretability and cross-corpus comparability. Domain Representative Keyword Selection (DRKS) proposes a probabilistic method for extracting distinctive, context-aware keywords that capture the semantics of a target domain. Short-Text Topic Modeling (STTM) leverages large language models and prefix-tuned autoencoders to enrich sparse inputs and produce coherent topics under extreme document-level scarcity. Low-Resource Topic Modeling (LRTM) presents DALTA, a domain-adaptation framework that transfers knowledge from data-rich corpora while preserving target-specific nuance. Across multiple datasets and domains, these methods demonstrate substantial improvements over state-of-the-art baselines in coherence, stability, and adaptability. Together, they establish a principled foundation for building structured, interpretable, and adaptable semantic models in diverse and resource-constrained settings.
- Graduation Semester
- 2025-12
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/132501
- Copyright and License Information
- Copyright 2025 Pritom Saha Akash
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…