Advancing semantic modeling: addressing coordination, interpretability, and data scarcity in domain representation

Akash, Pritom Saha

Advancing semantic modeling: addressing coordination, interpretability, and data scarcity in domain representation

Akash, Pritom Saha

Permalink

https://hdl.handle.net/2142/132501

Description

Title

Advancing semantic modeling: addressing coordination, interpretability, and data scarcity in domain representation

Author(s)

Akash, Pritom Saha

Issue Date

2025-11-19

Director of Research (if dissertation) or Advisor (if thesis)

Chang, Kevin Chen-Chuan

Doctoral Committee Chair(s)

Chang, Kevin Chen-Chuan

Committee Member(s)

Zhai, ChengXiang
He, Jingrui
Popa, Lucian

Department of Study

Siebel School Comp & Data Sci

Discipline

Computer Science

Degree Granting Institution

University of Illinois Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Semantic Modeling
Topic Modeling
Domain Representation
Keyword Selection
Domain Adaptation
Variational Auto Encoder

Language

eng

Abstract

The rapid growth of online text—from scientific articles and product catalogs to news and social media—offers an unprecedented opportunity to uncover structure and meaning from unorganized information. Traditional topic models provide a powerful framework for discovering latent themes but often fall short in practice: their outputs are generic and difficult to interpret, alignment across corpora is not guaranteed, and they struggle with both short documents and domains with limited data. These challenges limit their usefulness for applications such as search, recommendation, trend analysis, and domain comparison, where interpretability and adaptability are essential. This thesis develops four complementary solutions that address these limitations. Coordinated Topic Modeling (CTM) introduces a framework for aligning corpus-specific topics with interpretable reference topics, enabling both interpretability and cross-corpus comparability. Domain Representative Keyword Selection (DRKS) proposes a probabilistic method for extracting distinctive, context-aware keywords that capture the semantics of a target domain. Short-Text Topic Modeling (STTM) leverages large language models and prefix-tuned autoencoders to enrich sparse inputs and produce coherent topics under extreme document-level scarcity. Low-Resource Topic Modeling (LRTM) presents DALTA, a domain-adaptation framework that transfers knowledge from data-rich corpora while preserving target-specific nuance. Across multiple datasets and domains, these methods demonstrate substantial improvements over state-of-the-art baselines in coherence, stability, and adaptability. Together, they establish a principled foundation for building structured, interpretable, and adaptable semantic models in diverse and resource-constrained settings.

Graduation Semester

2025-12

Type of Resource

Thesis

Handle URL

https://hdl.handle.net/2142/132501

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Dissertations and Theses - Computer Science

Dissertations and Theses from the Siebel School of Computer Science

Advancing semantic modeling: addressing coordination, interpretability, and data scarcity in domain representation

Akash, Pritom Saha

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Computer Science

Log In