Files in this item
Files  Description  Format 

application/pdf Jingfei_Zhang.pdf (6MB)  (no description provided) 
Description
Title:  Statistical inference on network data 
Author(s):  Zhang, Jingfei 
Director of Research:  Chen, Yuguo 
Doctoral Committee Chair(s):  Chen, Yuguo 
Doctoral Committee Member(s):  Douglas, Jeffrey A.; Marden, John I.; Simpson, Douglas G. 
Department / Program:  Statistics 
Discipline:  Statistics 
Degree Granting Institution:  University of Illinois at UrbanaChampaign 
Degree:  Ph.D. 
Genre:  Dissertation 
Subject(s):  Network Inference
Random Graphs Sequential Importance Sampling Network Robustness Exponential Random Graph Model Dense Subgraph Discovery Simulated Annealing Community Detection Degree Corrected Stochastic Block Model 
Abstract:  Networks arise from modeling complex systems in various fields, such as computer science, social science, biology, psychology and finance. Understanding and analyzing networks help us better understand these complex systems and extract useful information. In this dissertation, we study problems on network sampling, network modeling and data mining on networks. Random graphs with given vertex degrees have been widely used as a model for many realworld complex networks. However, both statistical inference and analytic study of such networks present great challenges. In Chapter 2, we propose new sequential importance sampling methods for sampling networks with a given degree sequence. These samples can be used to approximate closely the null distributions of a number of test statistics involved in such networks, and provide an accurate estimate of the total number of networks with given vertex degrees. We study the asymptotic behavior of the proposed algorithm and prove that the importance weight remains bounded as the size of the graph grows. This property guarantees that the proposed sampling algorithm can still work efficiently even for large sparse graphs. We apply our method to a range of examples to demonstrate its efficiency in real problems. One important question for complex networks is how the network's connectivity will be affected if the network is under targeted attacks, i.e., the nodes with the most links are attacked. In Chapter 3, we found that a dolphin network is resilient to targeted attacks. To further study the resilient property, we fit an exponential random graph model to the dolphin network. The fitted model characterizes network resiliency and identifies local structures that can reproduce the global resilience property. Such a statistical model can be used to build the Internet and other networks to increase the attack tolerance of those networks. The problem of finding densely connected subgraphs in a network has attracted a lot of recent attention. Such subgraphs are sometimes referred to as communities in social networks or molecular modules in protein networks. In Chapter 4, we propose two Monte Carlo optimization algorithms for identifying the densest subgraphs with a fixed size or with size in a given range. The new algorithms combine the idea of simulated annealing and efficient moves for the Markov chain, and both algorithms are shown to converge to the set of optimal states (densest subgraphs) with probability one. When applied to a yeast protein interaction network and a stock market graph, the algorithms identify interesting new densely connected subgraphs. One of the most relevant features of networks representing real systems is the community structure. Detecting communities is of great importance in understanding, analyzing, organizing networks. In Chapter 5, we describe a statistical framework for modularitybased network community detection. We derive the modularity function under the proposed statistical framework, and propose a fast modularity maximization algorithm based on the eigenspectrum of the modularity matrix. A hypothesis testing procedure is developed to determine the significance of an identified community structure. The modularity formulated under the proposed statistical framework is shown to be consistent under the degreecorrected stochastic block model framework. Several synthetic networks and real world networks are used to demonstrate the effectiveness of our method. 
Issue Date:  20140530 
URI:  http://hdl.handle.net/2142/49840 
Rights Information:  Copyright 2014 Jingfei Zhang 
Date Available in IDEALS:  20140530 20160922 
Date Deposited:  201405 
This item appears in the following Collection(s)

Dissertations and Theses  Statistics

Graduate Dissertations and Theses at Illinois
Graduate Theses and Dissertations at Illinois