Files in this item

FilesDescriptionFormat

application/pdf

application/pdfHUANG-DISSERTATION-2017.pdf (4MB)Restricted Access
(no description provided)PDF

Description

Title:Statistical algorithms using multisets and statistical inference of heterogeneous networks
Author(s):Huang, Weihong
Director of Research:Chen, Yuguo
Doctoral Committee Chair(s):Chen, Yuguo
Doctoral Committee Member(s):Culpepper, Steven; Douglas, Jeffrey; Liang, Feng
Department / Program:Statistics
Discipline:Statistics
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:Ph.D.
Genre:Dissertation
Subject(s):Multisets
EM algorithm
Metropolis-Hastings algorithm
Heterogeneous network
Clustering
Mixed membership model
Variational algorithm
Abstract:Computational statistics, including methods such as Markov chain Monte Carlo (MCMC), bootstrap, approximate Bayesian computation, is an important part in modern statistics and has been widely used in many areas, such as Bayesian statistics, computational biology, and computational physics. In this thesis, we study three problems: improvement of the efficiency for the EM algorithm and the MCMC method, and statistical analysis for heterogeneous networks. The expectation-maximization (EM) algorithm is widely used in computing the maximum likelihood estimates when the observations can be viewed as incomplete data. However, the convergence rate of the EM algorithm can be slow especially when a large portion of the data is missing. In Chapter 2, we propose the multiset EM algorithm that can help the convergence of the EM algorithm. The key idea is to augment the system with a multiset of the missing component, and construct an appropriate joint distribution of the augmented complete data. We demonstrate that the multiset EM algorithm can outperform the EM algorithm, especially when EM has difficulties in convergence and the E-step involves Monte Carlo approximation. The multiset sampler proposed by Leman et al. (2009) has been shown to be an effective algorithm to sample from complex multimodal distributions, but the multiset sampler requires that the parameters in the target distribution can be divided into two parts: the parameters of interest and the nuisance parameters. In Chapter 3, we propose a new self-multiset sampler (SMSS) which extends the multiset sampler to distributions without nuisance parameters. We also generalize our method to distributions with unbounded or infinite support. Numerical results show that the SMSS and its generalization have a substantial advantage in sampling multimodal distributions compared to the ordinary Markov chain Monte Carlo algorithm and some popular variants. Heterogeneous networks are useful for modeling complex systems, which consist of different types of objects. However, there are limited statistical models to deal with heterogeneous networks. In Chapter 4, we propose a statistical model for community detection in heterogeneous networks. To allow heterogeneity in the data and the content dependent property of the pairwise relationship, we formulate the heterogeneous version of the mixed membership stochastic blockmodel. We also apply a variational algorithm for posterior inference. We demonstrate the advantage of the proposed method, in modeling overlapping communities and multiple memberships, through simulation studies and applications to the DBLP data.
Issue Date:2017-06-27
Type:Thesis
URI:http://hdl.handle.net/2142/98245
Rights Information:Copyright 2017 Weihong Huang
Date Available in IDEALS:2017-09-29
Date Deposited:2017-08


This item appears in the following Collection(s)

Item Statistics