Files in this item

FilesDescriptionFormat

application/pdf

application/pdfXinxin_Shu.pdf (2MB)
(no description provided)PDF

Description

Title:Time-varying networks estimation and Chinese words segmentation
Author(s):Shu, Xinxin
Director of Research:Qu, Annie
Doctoral Committee Chair(s):Qu, Annie
Doctoral Committee Member(s):Simpson, Douglas G.; Douglas, Jeffrey A.; Chen, Xiaohui
Department / Program:Statistics
Discipline:Statistics
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:Ph.D.
Genre:Dissertation
Subject(s):dynamic networks
proximal gradient method
varying-coefficient model
Language Processing
words segmentation
Abstract:This thesis contains two research areas including time-varying networks estimation and Chinese words segmentation. Chapter 1 introduces the background of the time-varying networks and the structure of Chinese language, followed by the motivations and goals for the research work. In many biomedical and social science studies, it is important to identify and predict the dynamic changes of associations among network data over time. However, inadequate literature addresses the estimation of time-varying networks mainly because of extremely large volume of time-varying network data, leading to the computational difficulty. In Chapter 2, we propose a varying-coefficient model to incorporate time-varying network data, and impose a piecewise-penalty function to capture local features of the network associations. The advantages of the proposed approach are that it is nonparametric and therefore flexible in modeling dynamic changes of association for network data problems, and capable of identifying the time regions when dynamic changes of associations occur. To achieve local sparsity of network estimation, we implement a group penalization strategy involving overlapping parameters among different groups. We also develop a fast algorithm, based on the smoothing proximal gradient method, which is computationally efficient and accurate. We illustrate the proposed method through simulation studies and children's attention deficit hyperactivity disorder fMRI data, and show that the proposed method and algorithm efficiently recover dynamic network changes over time. The digital information has become an essential part of modern life, from scientific research, entertainment business, product marketing to national security protection. So developing fast automatic process of information extraction becomes extremely demanding. Chinese language is the second popular language among all internet users but is still severely under-studied, mainly due to the challenge of its ambiguity nature. In Chapter 3, we propose a new method for word segmentation in Chinese language processing. The Chinese language is the second most popular language among all internet users, but it is still not well-studied. Segmentation becomes crucial for Chinese language processing, since it is the first step to develop a fast automatic process of information extraction. One major challenge is that the Chinese language is highly context-dependent, and is very different from English. We propose a machine-learning model with computationally feasible loss functions which utilize linguistically-embedded features. The proposed method is investigated through the Peking university corpus Chinese documents. Our numerical study shows that the proposed method performs better than existing top competitive performers.
Issue Date:2014-09-16
URI:http://hdl.handle.net/2142/50539
Rights Information:Copyright 2014 Xinxin Shu
Date Available in IDEALS:2014-09-16
Date Deposited:2014-08


This item appears in the following Collection(s)

Item Statistics