Files in this item



application/pdfLIU-DISSERTATION-2021.pdf (1MB)Restricted Access
(no description provided)PDF


Title:Statistical inference for complex networks
Author(s):Liu, Yan
Director of Research:Chen, Yuguo
Doctoral Committee Chair(s):Chen, Yuguo
Doctoral Committee Member(s):Douglas, Jeffrey; Simpson, Douglas; Yang, Yun
Department / Program:Statistics
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Network Inference
Random Graphs
Variational Inference
Latent Space Model
Stochastic Block Model
Community Detection
Dynamic Networks
Heterogeneous Networks
Abstract:A network consists of a set of vertices and a set of edges between these vertices. The vertices represent a set of entities, and the edges represent relationships among these entities. Networks and relational data arise in various areas of research, such as neuroscience, biology, computer science, and social science. In the past two decades, a large number of statistical methods have been proposed for modeling such relational data, identifying community structures, hypothesis testing, and model selection. The majority of these methods dealt with the case where only one network observation is available. However, as the data collection ability grows, the surge in the number of complex network datasets poses new challenges for statisticians. Complex networks encompass a variety of frameworks. For example, in the dynamic network setting, one may observe a time series of networks over the same set of nodes. In heterogeneous networks, different types of entities interact with each other and form different types of links. In multi-layer networks, one observes multiple networks on the same set of nodes, which represent different types of interactions among these entities. In these scenarios, new probabilistic models are needed in order to deal with the time-varying aspect or heterogeneity of the networks. Furthermore, while community detection is well-studied for a single network, it is still of interest to develop new methods and theories for community detection in complex networks. As the size of network data becomes increasingly large, scalable algorithms for large-scale networks are also of interest. In this dissertation, we focus on modeling, community detection, theoretical properties, and scalable inference for complex networks. In the first project, we propose a variational inference algorithm for dynamic latent space models. Compared with sampling-based methods such as Markov chain Monte Carlo (MCMC), the proposed algorithm yields similar empirical performance with much less computation time. We derive the theoretical properties of the variational Bayes risk of the proposed algorithm and demonstrate the performance on simulated data and two real-world datasets. Heterogeneous mixed-membership stochastic blockmodel (MMSB) is powerful for modeling overlapping communities and multiple community memberships, but many methods in the literature lack a theoretical guarantee. In the second project, we study the theoretical properties of the heterogeneous version of the MMSB. We propose a community detection consistency criterion for mixed-membership networks, and we show that both the maximum-likelihood estimator and the Bayes estimator of heterogeneous MMSB enjoy community detection consistency. In the third project, we consider the problem of inferring the community memberships of the nodes in dynamic networks, where the community memberships are allowed to change over time. We propose two matrix factorization methods, namely, weighted orthogonal linked matrix factorization (OLMF) and co-regularization spectral clustering. We derive the clustering error bound for both proposed algorithms, and show that both algorithms are consistent under various asymptotic regimes. The proposed algorithms outperform the existing early-fusion and late-fusion methods in several simulation studies and two real data examples. In the fourth project, we study the variational EM algorithm for community detection in the multi-layer stochastic block model (MLSBM). We show that the proportion of misclassified nodes given by the variational EM algorithm converges to zero as the number of nodes, the number of layers, and the number of communities go to infinity, i.e., the variational EM algorithm enjoys the property of community detection consistency.
Issue Date:2021-04-21
Rights Information:Copyright 2021 Yan Liu
Date Available in IDEALS:2021-09-17
Date Deposited:2021-05

This item appears in the following Collection(s)

Item Statistics