Files in this item
Files | Description | Format |
---|---|---|
application/pdf ![]() ![]() | (no description provided) |
Description
Title: | Topic mining and categorization in online discussion forums |
Author(s): | Dey, Jishnu |
Advisor(s): | Zhai, ChengXiang |
Department / Program: | Computer Science |
Discipline: | Computer Science |
Degree Granting Institution: | University of Illinois at Urbana-Champaign |
Degree: | M.S. |
Genre: | Thesis |
Subject(s): | discussion forums
topic modeling text categorization hierarchical categorization |
Abstract: | Online Forums provide a useful way to engage in discussions about a wide variety of topics, as well as gather custom information for which an exact source may not be available, using a combination of knowledge and human interpretation. Usually forums have categories which cater to a particular topic of interest, allowing information seekers and topic experts to meet. It is thus imperative to organize forum data into an organized structure. In this work we look at methods for categorizing forum posts into appropriate categories, where the number of such categories is large. We compare several baseline methods with state-of-the-art deep learning methods and analyze their performance. We observe that given the highly keyword-centric nature of our data, deep learning methods only slightly outperform baseline methods. Following this, we perform topic modeling on the forum data to find latent topics which creates a hierarchy across forum categories and clusters similar categories. In this process we observe that some of the recent approaches in topic modeling that utilize word embeddings lead to better topics. Finally, we use this hierarchy to perform hierarchical classification of the forum posts to allow better management of the classification task and analyze the benefits of this method. |
Issue Date: | 2020-05-12 |
Type: | Thesis |
URI: | http://hdl.handle.net/2142/108348 |
Rights Information: | Copyright 2020 Jishnu Dey |
Date Available in IDEALS: | 2020-08-27 |
Date Deposited: | 2020-05 |
This item appears in the following Collection(s)
-
Dissertations and Theses - Computer Science
Dissertations and Theses from the Dept. of Computer Science -
Graduate Dissertations and Theses at Illinois
Graduate Theses and Dissertations at Illinois