Files in this item

FilesDescriptionFormat

application/pdf

application/pdfDEY-THESIS-2020.pdf (5MB)Restricted Access
(no description provided)PDF

Description

Title:Topic mining and categorization in online discussion forums
Author(s):Dey, Jishnu
Advisor(s):Zhai, ChengXiang
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:M.S.
Genre:Thesis
Subject(s):discussion forums
topic modeling
text categorization
hierarchical categorization
Abstract:Online Forums provide a useful way to engage in discussions about a wide variety of topics, as well as gather custom information for which an exact source may not be available, using a combination of knowledge and human interpretation. Usually forums have categories which cater to a particular topic of interest, allowing information seekers and topic experts to meet. It is thus imperative to organize forum data into an organized structure. In this work we look at methods for categorizing forum posts into appropriate categories, where the number of such categories is large. We compare several baseline methods with state-of-the-art deep learning methods and analyze their performance. We observe that given the highly keyword-centric nature of our data, deep learning methods only slightly outperform baseline methods. Following this, we perform topic modeling on the forum data to find latent topics which creates a hierarchy across forum categories and clusters similar categories. In this process we observe that some of the recent approaches in topic modeling that utilize word embeddings lead to better topics. Finally, we use this hierarchy to perform hierarchical classification of the forum posts to allow better management of the classification task and analyze the benefits of this method.
Issue Date:2020-05-12
Type:Thesis
URI:http://hdl.handle.net/2142/108348
Rights Information:Copyright 2020 Jishnu Dey
Date Available in IDEALS:2020-08-27
Date Deposited:2020-05


This item appears in the following Collection(s)

Item Statistics