Files in this item



application/pdfYunliang_Jiang.pdf (738kB)
(no description provided)PDF


Title:Clustering and comparing information extracted from personal health messages
Author(s):Jiang, Yunliang
Director of Research:Schatz, Bruce R.
Doctoral Committee Chair(s):Schatz, Bruce R.
Doctoral Committee Member(s):Han, Jiawei; Zhai, ChengXiang; Mei, Qiaozhu
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Personal Messages
Abstract:The development of Web 2.0 techniques has led to the prosperity of online communities, which spread to various domains and areas in our daily life. When it comes to the medicine and healthcare domain, a series of good online services such as Yahoo! Groups,WebMD and Med- Help, offer patients and physicians a good platform to discuss health problems, e.g., diseases and drugs, diagnoses and treatments, which also provide a large volume of data for researchers to analyze and explore. However, some nature of the personal messages, e.g., unclean, unstructured and isolated from clinical practice, hinders users’ effective digestion of information in the front end and challenges the data analysis in the back end. In such a scenario, the objective of my thesis is to apply the advanced data mining, information retrieval and natural language processing techniques to effectively analyze and re-organize the rich source of personal health messages from online medical communities, in order to satisfy patients’ information need and support physicians’ clinical practice. Specially, in the first part of the dissertation, I introduce an SVM-based multi-class classification method which utilizes term-appearance, lexical and semantic features to effectively classify health messages sampled from our unique dataset of Yahoo! Health Groups into three categories: News, User Comments and Spam; in the second part, I depict a comprehensive system with an extensive evaluation framework to organize and cluster patient outcomes utilizing topic model, which groups large collections of personal comments into a series of topics, guided by expert comments; in the third part of the dissertation, I address a novel and promising topic: Comparative Effectiveness Research (CER) hypothesis prediction, by presenting a study which evaluates patients’ opinions on different treatments by machine enabled sentiment analysis or human analysts utilizing our MedHelp dataset. By suggesting three different methods to compare such opinions, reliable conclusions about the patients’ preference on different treatments can be drawn consistently, which imply the effectiveness of the treatments. Furthermore, the study is also extended to demographic analysis to explore the preference in specific group of people, representing population cohorts.
Issue Date:2013-02-03
Rights Information:Copyright 2012 Yunliang Jiang
Date Available in IDEALS:2013-02-03
Date Deposited:2012-12

This item appears in the following Collection(s)

Item Statistics