Files in this item



application/pdfZhao_Peixiang.pdf (2MB)
(no description provided)PDF


Title:On querying large scale information networks
Author(s):Zhao, Peixiang
Director of Research:Han, Jiawei
Doctoral Committee Chair(s):Han, Jiawei
Doctoral Committee Member(s):Aggarwal, Charu C.; Chang, Kevin C-C.; Zhai, ChengXiang
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Information networks
Query processing, Graphs
Graph indexing
Data warehouse
Graph streams
Abstract:Social and technical information systems usually consist of a large number of interacting physical, conceptual, and human/societal entities. Such individual entities are interconnected to form large and sophisticated networks, which, without loss of generality, are often refereed to as information networks. Examples of information networks include the Web, highway or urban transportation networks, research collaboration and publication networks, biological networks and social networks. Clearly, information networks are ubiquitous and form a critical component of modern information infrastructure. Theoretically, information networks can be modeled and manipulated as large scale graphs, which have gradually become the first-class citizens in the data management and mining fields. However, it is extremely inefficient to process such graph-structured data in any existing data models or computational frameworks. Real world information networks are massive, whose sheer size may simply overwhelm a direct application of any conventional graph algorithms designed and implemented for small or medium-sized memory-resident graphs. In the mean time, information networks are not static but rapidly changing all the time. The massive and dynamic nature of information networks has posed special challenges to effective query processing especially in scenarios where real-time responses are desirable. In this thesis, we will consider a series of queries of practical value arising in real world network scenarios, and explore the effective and potentially scalable querying solutions for large scale information networks. All such queries have been found fundamental and critically important at the core of many advanced network applications. First of all, PRank is proposed to answer the structural similarity query: "which entities are (structurally) similar to a query entity" in an information network. Second, SPath is proposed as a high performance graph indexing mechanism to address general subgraph queries on large scale information networks. Third, GraphCube is designed as a new warehousing model that supports OLAP queries on large multidimensional information networks. Last, but not the least, gSketch is devised as a new sketch method that combines well-studied synopsis structures with a sketch partitioning technique in order to estimate and optimize the responses to basic queries on rapidly changing information networks. Our experimental studies demonstrate that our querying methods are highly efficient an scalable, and have achieved satisfactory performance for the fundamental queries on large scale information networks. We should admit that the queries examined in the thesis are merely the tip of the iceberg. The marriage of information network analysis and query processing technology will bring many exciting opportunities for future study, which are briefed in the end of the thesis.
Issue Date:2012-09-18
Rights Information:Copyright 2012 Peixiang Zhao
Date Available in IDEALS:2012-09-18
Date Deposited:2012-08

This item appears in the following Collection(s)

Item Statistics