Files in this item



application/pdfTAO-DISSERTATION-2021.pdf (4MB)Restricted Access
(no description provided)PDF


Title:Text mining and its applications in food safety
Author(s):Tao, Dandan
Director of Research:Feng, Hao
Doctoral Committee Chair(s):Padua, Graciela Wild
Doctoral Committee Member(s):Stasiewicz, Matthew Jon; Banerjee, Pratik
Department / Program:Food Science & Human Nutrition
Discipline:Food Science & Human Nutrition
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Food safety
Text mining
Outbreak response
Risk communication
Social media
Abstract:Food safety is essential for the prevention of foodborne diseases that threaten the health of the world’s population. While numerous methods have been developed for enhancing food safety throughout the supply chain, foodborne outbreaks continue to occur and cause significant costs in the numbers of illnesses, economic burdens, and precious lives every year. The early detection of foodborne outbreaks and efficient outbreak investigation are critical for mitigating their impacts. In the era of digitalization, the availability of publicly accessible databases, and the use of the Internet, especially real-time social media, have provided a large amount of digital text data related to food safety. As more and more text information becomes accessible online, it is natural to infer that an analysis of digital text data could provide a new approach to assist outbreak investigations. The overall goal of the study was to develop data-analytical methods utilizing multiple sources of digital data, especially text data, to assist outbreak investigators in their decision-making process. Historical outbreak data from one government database and real-time consumer reports from social media were the two major sources of the data employed in this study. Food source identification was the main focus of historical outbreak data analysis using data mining tools. Through statistical models and text classification, the relationships between outbreak characteristics, food vehicles, and etiologies were discovered. In addition, the relationships among the foods and ingredients involved in historical outbreaks were illustrated by the construction of a food-food network and food-ingredient network. Using a network analysis and Monte Carlo simulation, the most important foods given specific etiologies and the most probable ingredients given specific foods were identified. The knowledge discovered in the historical outbreaks could be useful in future outbreak investigation by providing insights for source attributions. The second part of the study utilized social media to identify consumer reports of relevance to food safety. A dual-task BERTweet model, a pre-trained language deep learning model for English Tweets, was developed to allow the detection of unreported foodborne illnesses from Twitter and the extraction of important entities associated with the cases. The model outperformed previous models and generated essential information (e.g., food, symptoms) useful in the analysis of potential outbreaks. Social media text mining was also applied to detect farmer’s market topics (e.g., food safety) on Twitter and Yelp, which may provide health departments with insights on the hygiene conditions of local farmers markets based on consumer responses. In summary, this study contributed to (1) the development of data science approaches as new protocols for analyzing digital textural data related to food safety; (2) improving our understanding of the roles played by different sources of data and of how to integrate multi-source of data; and (3) the cross-pollination of two scientific research domains - food science and computer science - by elucidating how big-data-mediated computational tools may contribute to more effective decision-making in food safety scenarios.
Issue Date:2021-06-22
Rights Information:Copyright 2021 Dandan Tao
Date Available in IDEALS:2022-01-12
Date Deposited:2021-08

This item appears in the following Collection(s)

Item Statistics