Files in this item

FilesDescriptionFormat

application/pdf

application/pdfCHEN-DISSERTATION-2020.pdf (5MB)Restricted Access
(no description provided)PDF

Description

Title:Identifiability for latent class models
Author(s):Chen, Yinyin
Director of Research:Liang, Feng; Yang, Yun
Doctoral Committee Chair(s):Liang, Feng; Culpepper, Steven Andrew
Doctoral Committee Member(s):Narisetty, Naveen Naidu
Department / Program:Statistics
Discipline:Statistics
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:Ph.D.
Genre:Dissertation
Subject(s):Identifiability
Generic identifiability
Latent class models
Cognitive diagnostic models
Topic models
Abstract:Despite the fact that latent class models have been widely applied and appeared to perform well in various applications, it is well known that inference for latent class models is challenging due to the potential nonidentifiability of the underlying model parameters. In other words, multiple or infinite many sets of parameters of a latent class model could correspond to the same data generating process, i.e., are not statistically differentiable. This thesis studies the identifiability issue of some latent class models arising from real applications in cognitive diagnosis and topic modelings. The main contribution of this thesis include identifiability conditions that are weaker than the existing ones in the current literature, computational tools such as MCMC algorithms and EM algorithms that return estimates and uncertainty quantification of the underlying model parameters, and asymptotic analysis such as consistency study on the proposed estimation procedures for identifiable models. In the first part of the thesis, we consider Cognitive Diagnostic Models (CDMs). CDMs are latent variable models developed to infer latent skills, knowledge, or personalities that underlie responses to educational, psychological, and social science tests and measures. We derive a new set of sufficient conditions for generic identifiability of CDM parameters. An important contribution for practice is that our new generic identifiability conditions are more likely to be satisfied in empirical applications than existing conditions that ensure strict identifiability. For computation, we formulate learning the underlying latent structure as a variable selection problem, and develop a new Bayesian variable selection algorithm that explicitly enforces generic identifiability conditions and monotonicity of item response functions to ensure valid posterior inference. We demonstrate the empirical performance of our algorithm on simulation studies and several real educational testing datasets. In the second part of the thesis, we consider admixture models. The most widely known admixture models are topic models, which model each document by a convex combination of a set of word-frequency vectors, known as topics. Although identifying such latent topics is of primary interest in many applications, it is well-known that the topic parameters are not identifiable. Prior work addressed this issue by imposing stringent conditions on the existence of certain anchor words or on higher order statistics of the data. We derive a new set of identifiability conditions from convex geometry, which are weaker than the conditions from prior work. In addition, we study estimation consistency and establish the convergence rate for topic model with identifiable parameters. We also provide an EM algorithm for computation and demonstrate its competitive performance through simulation studies and real applications.
Issue Date:2020-05-07
Type:Thesis
URI:http://hdl.handle.net/2142/108306
Rights Information:Copyright 2020 Yinyin Chen
Date Available in IDEALS:2020-08-27
Date Deposited:2020-05


This item appears in the following Collection(s)

Item Statistics