Files in this item

FilesDescriptionFormat

application/pdf

application/pdfJingfei_Zhang.pdf (6MB)
(no description provided)PDF

Description

Title:Statistical inference on network data
Author(s):Zhang, Jingfei
Director of Research:Chen, Yuguo
Doctoral Committee Chair(s):Chen, Yuguo
Doctoral Committee Member(s):Douglas, Jeffrey A.; Marden, John I.; Simpson, Douglas G.
Department / Program:Statistics
Discipline:Statistics
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:Ph.D.
Genre:Dissertation
Subject(s):Network Inference
Random Graphs
Sequential Importance Sampling
Network Robustness
Exponential Random Graph Model
Dense Subgraph Discovery
Simulated Annealing
Community Detection
Degree Corrected Stochastic Block Model
Abstract:Networks arise from modeling complex systems in various fields, such as computer science, social science, biology, psychology and finance. Understanding and analyzing networks help us better understand these complex systems and extract useful information. In this dissertation, we study problems on network sampling, network modeling and data mining on networks. Random graphs with given vertex degrees have been widely used as a model for many real-world complex networks. However, both statistical inference and analytic study of such networks present great challenges. In Chapter 2, we propose new sequential importance sampling methods for sampling networks with a given degree sequence. These samples can be used to approximate closely the null distributions of a number of test statistics involved in such networks, and provide an accurate estimate of the total number of networks with given vertex degrees. We study the asymptotic behavior of the proposed algorithm and prove that the importance weight remains bounded as the size of the graph grows. This property guarantees that the proposed sampling algorithm can still work efficiently even for large sparse graphs. We apply our method to a range of examples to demonstrate its efficiency in real problems. One important question for complex networks is how the network's connectivity will be affected if the network is under targeted attacks, i.e., the nodes with the most links are attacked. In Chapter 3, we found that a dolphin network is resilient to targeted attacks. To further study the resilient property, we fit an exponential random graph model to the dolphin network. The fitted model characterizes network resiliency and identifies local structures that can reproduce the global resilience property. Such a statistical model can be used to build the Internet and other networks to increase the attack tolerance of those networks. The problem of finding densely connected subgraphs in a network has attracted a lot of recent attention. Such subgraphs are sometimes referred to as communities in social networks or molecular modules in protein networks. In Chapter 4, we propose two Monte Carlo optimization algorithms for identifying the densest subgraphs with a fixed size or with size in a given range. The new algorithms combine the idea of simulated annealing and efficient moves for the Markov chain, and both algorithms are shown to converge to the set of optimal states (densest subgraphs) with probability one. When applied to a yeast protein interaction network and a stock market graph, the algorithms identify interesting new densely connected subgraphs. One of the most relevant features of networks representing real systems is the community structure. Detecting communities is of great importance in understanding, analyzing, organizing networks. In Chapter 5, we describe a statistical framework for modularity-based network community detection. We derive the modularity function under the proposed statistical framework, and propose a fast modularity maximization algorithm based on the eigen-spectrum of the modularity matrix. A hypothesis testing procedure is developed to determine the significance of an identified community structure. The modularity formulated under the proposed statistical framework is shown to be consistent under the degree-corrected stochastic block model framework. Several synthetic networks and real world networks are used to demonstrate the effectiveness of our method.
Issue Date:2014-05-30
URI:http://hdl.handle.net/2142/49840
Rights Information:Copyright 2014 Jingfei Zhang
Date Available in IDEALS:2014-05-30
2016-09-22
Date Deposited:2014-05


This item appears in the following Collection(s)

Item Statistics