Time-varying networks estimation and Chinese words segmentation

Shu, Xinxin

Time-varying networks estimation and Chinese words segmentation

Shu, Xinxin

Permalink

https://hdl.handle.net/2142/50539

Description

Title

Time-varying networks estimation and Chinese words segmentation

Author(s)

Shu, Xinxin

Issue Date

2014-09-16

Director of Research (if dissertation) or Advisor (if thesis)

Qu, Annie

Doctoral Committee Chair(s)

Qu, Annie

Committee Member(s)

Simpson, Douglas G.
Douglas, Jeffrey A.
Chen, Xiaohui

Department of Study

Statistics

Discipline

Statistics

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Date of Ingest

2014-09-16T17:23:36Z

Keyword(s)

dynamic networks
proximal gradient method
varying-coefficient model
Language Processing
words segmentation

Abstract

This thesis contains two research areas including time-varying networks estimation and Chinese words segmentation. Chapter 1 introduces the background of the time-varying networks and the structure of Chinese language, followed by the motivations and goals for the research work. In many biomedical and social science studies, it is important to identify and predict the dynamic changes of associations among network data over time. However, inadequate literature addresses the estimation of time-varying networks mainly because of extremely large volume of time-varying network data, leading to the computational difficulty. In Chapter 2, we propose a varying-coefficient model to incorporate time-varying network data, and impose a piecewise-penalty function to capture local features of the network associations. The advantages of the proposed approach are that it is nonparametric and therefore flexible in modeling dynamic changes of association for network data problems, and capable of identifying the time regions when dynamic changes of associations occur. To achieve local sparsity of network estimation, we implement a group penalization strategy involving overlapping parameters among different groups. We also develop a fast algorithm, based on the smoothing proximal gradient method, which is computationally efficient and accurate. We illustrate the proposed method through simulation studies and children's attention deficit hyperactivity disorder fMRI data, and show that the proposed method and algorithm efficiently recover dynamic network changes over time. The digital information has become an essential part of modern life, from scientific research, entertainment business, product marketing to national security protection. So developing fast automatic process of information extraction becomes extremely demanding. Chinese language is the second popular language among all internet users but is still severely under-studied, mainly due to the challenge of its ambiguity nature. In Chapter 3, we propose a new method for word segmentation in Chinese language processing. The Chinese language is the second most popular language among all internet users, but it is still not well-studied. Segmentation becomes crucial for Chinese language processing, since it is the first step to develop a fast automatic process of information extraction. One major challenge is that the Chinese language is highly context-dependent, and is very different from English. We propose a machine-learning model with computationally feasible loss functions which utilize linguistically-embedded features. The proposed method is investigated through the Peking university corpus Chinese documents. Our numerical study shows that the proposed method performs better than existing top competitive performers.

Graduation Semester

2014-08

Permalink

http://hdl.handle.net/2142/50539

Copyright and License Information

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Time-varying networks estimation and Chinese words segmentation

Shu, Xinxin

Permalink

Description

Owning Collections

Graduate Dissertations and Theses at Illinois PRIMARY

Dissertations and Theses - Statistics

Log In