Withdraw
Loading…
Multi-dimensional mining of unstructured data with limited supervision
Zhang, Chao
Loading…
Permalink
https://hdl.handle.net/2142/102465
Description
- Title
- Multi-dimensional mining of unstructured data with limited supervision
- Author(s)
- Zhang, Chao
- Issue Date
- 2018-12-03
- Director of Research (if dissertation) or Advisor (if thesis)
- Han, Jiawei
- Doctoral Committee Chair(s)
- Han, Jiawei
- Committee Member(s)
- Zhai, ChengXiang
- Abdelzaher, Tarek
- Mei, Qiaozhu
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- data mining
- multi-dimensional analysis
- less supervision
- Abstract
- As one of the most important data forms, unstructured text data plays a crucial role in data-driven decision making in domains ranging from social networking and information retrieval to healthcare and scientific research. In many emerging applications, people's information needs from text data are becoming multi-dimensional---they demand useful insights for multiple aspects from the given text corpus. However, turning massive text data into multi-dimensional knowledge remains a challenge that cannot be readily addressed by existing data mining techniques. In this thesis, we propose algorithms that turn unstructured text data into multi-dimensional knowledge with limited supervision. We investigate two core questions: 1. How to identify task-relevant data with declarative queries in multiple dimensions? 2. How to distill knowledge from data in a multi-dimensional space? To address the above questions, we propose an integrated cube construction and exploitation framework. First, we develop a cube construction module that organizes unstructured data into a cube structure, by discovering latent multi-dimensional and multi-granular structure from the unstructured text corpus and allocating documents into the structure. Second, we develop a cube exploitation module that models multiple dimensions in the cube space, thereby distilling multi-dimensional knowledge from data to provide insights along multiple dimensions. Together, these two modules constitute an integrated pipeline: leveraging the cube structure, users can perform multi-dimensional, multi-granular data selection with declarative queries; and with cube exploitation algorithms, users can make accurate cross-dimension predictions or extract multi-dimensional patterns for decision making. The proposed framework has two distinctive advantages when turning text data into multi-dimensional knowledge: flexibility and label-efficiency. First, it enables acquiring multi-dimensional knowledge flexibly, as the cube structure allows users to easily identify task-relevant data along multiple dimensions at varied granularities and further distill multi-dimensional knowledge. Second, the algorithms for cube construction and exploitation require little supervision; this makes the framework appealing for many applications where labeled data are expensive to obtain.
- Graduation Semester
- 2018-12
- Type of Resource
- text
- Permalink
- http://hdl.handle.net/2142/102465
- Copyright and License Information
- Copyright 2018 Chao Zhang
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisDissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer ScienceManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…