Place of Twitter and Google search data in disease outbreak monitoring and forecasting: Case of the COVID-19 pandemic
Guigma, Thierry Alix Wendnoogma
This item is only available for download by members of the University of Illinois community. Students, faculty, and staff at the U of I may log in with your NetID and password to view the item. If you are trying to access an Illinois-restricted dissertation or thesis, you can request a copy through your library's Inter-Library Loan office or purchase a copy directly from ProQuest.
Permalink
https://hdl.handle.net/2142/124492
Description
Title
Place of Twitter and Google search data in disease outbreak monitoring and forecasting: Case of the COVID-19 pandemic
Author(s)
Guigma, Thierry Alix Wendnoogma
Issue Date
2024-03-14
Director of Research (if dissertation) or Advisor (if thesis)
Brooks, Ian
Doctoral Committee Chair(s)
Brooks, Ian
Committee Member(s)
Rogers, Wendy
He, Jing Rui
Torvik, Vetle
Department of Study
Illinois Informatics Institute
Discipline
Informatics
Degree Granting Institution
University of Illinois at Urbana-Champaign
Degree Name
Ph.D.
Degree Level
Dissertation
Keyword(s)
Facebook Prophet
X (twitter)
Google
Covid-19
Time Series
Language
eng
Abstract
Background The unprecedented global outbreak of COVID-19 has not only posed an immense threat to public health but has also catalyzed a paradigm shift in the way information is disseminated, consumed, and responded to. As the world grappled with the multifaceted challenges presented by the pandemic, an intricate interplay between the spread of the virus and the surge of related online social media interactions and Google search activity became increasingly apparent. Study Objectives The objectives of this study are twofold. First, it aims to evaluate to what extent social media and Google search data can be used to model COVID-19 transmission using Facebook Prophet. Second, it seeks to unravel the complex cause-and-effect mechanisms that govern the interaction between the pandemic’s trajectory and the digital landscape. Methods For the first objective, we compared the performance of a base model built only using historical data on COVID-19 cases to the performance of an advanced model using, in addition to the COVID-19 cases, Twitter and Google search data integrated as external regressors. The potential gain in accuracy was thus deduced by measuring the variation of the models’ Root Mean Squared Errors (RMSE). For the second objective, we conducted an impulse response analysis to investigate potential cause-and-effect relationships between different time series data, such as COVID-19 cases and online activity. This analysis helped quantify the strength of their relationship and indicated the direction of causality. Results The time series modeling showed that using Twitter and Google search data with Facebook Prophet enhances its prediction accuracy by 29.4%. It also enables the advanced model to detect changes in trends, whereas the base model cannot. However, these alternative data sources did not notably improve predictions when detailed case data is abundant. They only proved to be highly valuable in filling gaps and enhancing forecasts where COVID-19 reporting is challenging, highlighting their potential to generate reliable predictions in data-scarce situations. The Impulse Response Analysis revealed that as COVID-19 cases increase, there is a noticeable and immediate increase in discussions on social media and online searches related to the virus. This reflects the public’s heightened interest and engagement with COVID-19 topics as the pandemic’s situation changes. Conversely, however, there was a weak reverse relationship where social media discussions or online searches affect COVID-19 cases. This suggests that discussions and searches about COVID-19 do not appear to lead to important behavior changes that decrease or increase public COVID-19 transmission risks. Conclusion The study has explored leveraging Twitter and Google search data during the COVID-19 pandemic, revealing their transformative role in disease monitoring and forecasting. While these sources significantly enhance forecasting accuracy and offer real-time insights into public sentiments in data-scarce contexts, their power to influence disease spread is limited. Integrating these non-traditional sources with conventional data enriches understanding and informs dynamic public health strategies, emphasizing the evolving role of digital platforms in collective well-being amid emerging health challenges.
Use this login method if you
don't
have an
@illinois.edu
email address.
(Oops, I do have one)
IDEALS migrated to a new platform on June 23, 2022. If you created
your account prior to this date, you will have to reset your password
using the forgot-password link below.