Withdraw
Loading…
Subsampling based inference for network data
Chakrabarty, Sayan
This item's files can only be accessed by the System Administrators group.
Permalink
https://hdl.handle.net/2142/125814
Description
- Title
- Subsampling based inference for network data
- Author(s)
- Chakrabarty, Sayan
- Issue Date
- 2024-07-11
- Director of Research (if dissertation) or Advisor (if thesis)
- Chen, Yuguo
- Sengupta, Srijan
- Doctoral Committee Chair(s)
- Chen, Yuguo
- Committee Member(s)
- Shao, Xiaofeng
- Simpson, Douglas
- Department of Study
- Statistics
- Discipline
- Statistics
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- Blockmodels
- Community detection
- Large networks
- Model selection
- Network cross-validation
- Network subsampling
- Random dot product graph
- Abstract
- Contemporary systems often comprise interactions between numerous agents, typically represented using networks. Network data is widespread across disciplines such as social sciences, biological sciences, information technology, and computer sciences. As technology rapidly advances, networks arising from such fields are becoming increasingly large and complex. Effectively analyzing these networks poses challenges concerning computational feasibility and the choice of suitable analytical models. This dissertation addresses two such problems in this area. Large networks are becoming widespread in scientific fields. Performing statistical analysis on such large networks is challenging due to high computation time and memory requirements. In the second chapter of this dissertation, we introduce a subsampling-based divide-and-conquer algorithm, SONNET, for detecting communities in large networks. The algorithm divides the original network into several subnetworks with an overlap part and applies a community detection algorithm to each subnetwork. The results from each subnetwork are combined using a label matching approach to determine the final community labels. This method significantly reduces both memory and computation costs since it only requires processing and storing the smaller subnetworks. It is also parallelizable, enhancing its speed. Theoretical and numerical performance of the algorithm is also presented in this chapter. Complex and extensive networks are increasingly common in scientific applications across various fields. Despite the availability of numerous network models and methodologies, cross-validation on networks is still difficult due to the unique structure of network data. In the third chapter, we propose a general cross-validation procedure, CROISSANT, based on subsampling for networks. The proposed algorithm splits the original network into multiple subnetworks with a shared overlap, creating a training set comprising the subnetworks and a test set with the node pairs between the subnetworks. This train-test split forms the basis for a network cross-validation procedure that can be used for a broad range of model selection and parameter tuning problems for network data. The method is computationally efficient for large networks, as it utilizes smaller subnetworks for the training process. It is also adaptable for specific network model selection and parameter tuning, with theoretical justifications provided as well. Numerical results show that the proposed algorithm accurately performs model selection and parameter tuning on various simulated and real networks from diverse models. They also indicate that the method is faster than existing network cross-validation methods.
- Graduation Semester
- 2024-08
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/125814
- Copyright and License Information
- Copyright 2024 Sayan Chakrabarty
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…