Epidemic Prediction from Social Media using Event Extraction

We present two amazing works by our group on building generalizable cross-disease, cross-lingual frameworks for detecting, predicting, and providing information about epidemics using social media. The crux of our framework is building robust Event Extraction (EE) models for the social media and epidemiological domains. Here are our two works.

Event Detection from Social Media for Epidemic Prediction

University of California Los Angeles
2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024)

Abstract

Social media is an easy-to-access platform providing timely updates about societal trends and events. Discussions regarding epidemic-related events such as infections, symptoms, and social interactions can be crucial for informing policymaking during epidemic outbreaks. In our work, we pioneer exploiting Event Detection (ED) for better preparedness and early warnings of any upcoming epidemic by developing a framework to extract and analyze epidemic-related events from social media posts. To this end, we curate an epidemic event ontology comprising seven disease-agnostic event types and construct a Twitter dataset SPEED with human-annotated events focused on the COVID-19 pandemic. Experimentation reveals how ED models trained on COVID-based SPEED can effectively detect epidemic events for three unseen epidemics of Monkeypox, Zika, and Dengue; while models trained on existing ED datasets fail miserably. Furthermore, we show that reporting sharp increases in the extracted events by our framework can provide warnings 4-9 weeks earlier than the WHO epidemic declaration for Monkeypox. This utility of our framework lays the foundations for better preparedness against emerging epidemics.

What is Event Detection?

Event Detection simply involves identifying semantic events in natural language text. Here's an example of detecting various epidemic-related events like Symptom, Infect, and Death.

Event Detection Example

Epidemic Event Ontology

Here are all the epidemic-related events that are prevalently discussed in social media along with some examples. The core principle during our dataset construction is preserving pandemic-related yet disease-independent. Each event is carefully designed such that it can be generalized to all potential epidemic, and during annotation, the chosen trigger words are generalized as possible such that it is not exclusively defined under COVID context.

SPEED Ontology Table

Epidemic Event Extraction

We collect multi-disease data for SPEED and provide the statistics of SPEED dataset on the left side below. We benchmark various existing epidemiological works with the trained EE models on our SPEED data - as shown in the right figure below.

SPEED Stats Table SPEED Stats Table

SPEED models perform much better in the zero-shot disease transfer scenario compared to other baselines. More importantly, the performance of our zero-shot models is at par with models trained on limited target epidemic data - highlighting the strong utility of our model.

Epidemic Prediction

To evaluate the practical validity, we aggregate the epidemic-based events predicted by our SPEED framework across time. Any sharp increases in the events are reported as epidemic warnings. We conduct this study for Monkeypox epidemic of 2022 (based on models trained on COVID-19 data of 2020) and show the warnings with the number of cases in the figure below.

SPEED Stats Table

Our framework can provide warnings 4-9 weeks before the WHO warning declaring Monkeypox as a global health concern - highlighting the practical utility of our work.

Qualitative Examples of Actual Tweets

Here are examplar extracted events extracted by our framework from actual tweets.

SPEED Stats Table

Event-based Disease Profiling

Another way to use our framework is to generate event-based disease profiles (based on proportion of different events extracted) using public sentiments, which can provide high-level overview of what people are talking/concerned about regarding the epidemic. We provide the disease-profiles of various diseases developed through our framework below.

SPEED Stats Table

SPEED++: A Multilingual Event Extraction Framework for Epidemic Prediction and Preparedness

University of California, Los Angeles
The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024)

Abstract

Social media is often the first place where communities discuss the latest societal trends. Prior works have utilized this platform to extract epidemic-related information (e.g. infections, preventive measures) to provide early warnings for epidemic prediction. However, these works only focused on English posts, while epidemics can occur anywhere in the world, and early discussions are often in the local, non-English languages. In this work, we introduce the first multilingual Event Extraction (EE) framework SPEED++ for extracting epidemic event information for a wide range of diseases and languages. To this end, we extend a previous epidemic ontology with 20 argument roles; and curate our multilingual EE dataset SPEED++ comprising 5.1K tweets in four languages for four diseases. Annotating data in every language is infeasible; thus we develop zero-shot cross-lingual cross-disease models (i.e., training only on English COVID data) utilizing multilingual pre-training and show their efficacy in extracting epidemic-related events for 65 diverse languages across different diseases. Experiments demonstrate that our framework can provide epidemic warnings for COVID-19 in its earliest stages in Dec 2019 (3 weeks before global discussions) from Chinese Weibo posts without any training in Chinese. Furthermore, we exploit our framework's argument extraction capabilities to aggregate community epidemic discussions like symptoms and cure measures, aiding misinformation detection and public attention monitoring. Overall, we lay a strong foundation for multilingual epidemic preparedness.

From Event Detection to Event Extraction

Event Extraction extends Event Detection (ED) by not only identifying the event triggers but also corresponding arguments (event-related information) from natural language text. Here's an example of detecting various epidemic-related events like Infect and Control.

Event Extraction Example

Since our work focuses on multilinguality, we also provide some example for Hindi here below.

Event Extraction Example

Epidemic Event Ontology

We improve the existing SPEED ontology by supplementing each event with corresponding arguments. We provide this enriched ontology below.

SPEED++ Ontology Table

Experimental Benchmarking

We train models in a zero-shot cross-lingual cross-disease setup. To evaluate the models, we annotate some EE data in three other languages. Below, we provide the multilingual statistics of our SPEED++ dataset. Starting from the left, we have: (a) number of sentences per language, (b) average length of each sentence, (c) number of event mentions, and (d) number of supporting arguments.

SPEED++ Data Statistics

To benchmark models in the zero-shot cross-lingual cross-disease setup, we consider the following data splits.

SPEED++ Split for Event Extraction

We train cross-lingual models using TagPrime and synthetic data generation using CLaP on our SPEED++ data. We benchmark our model with various works and show the performances below.

SPEED++ Split for Event Extraction

Global Epidemic Tracking

To practically utilize our work, we study its utility for global epidemic trends by plotting the extracted events per language with the number of infections in each country below, all written in a single day (May 28, 2020). We show a strong correlation of 0.73 across 65 languages and 117 countries - highlighting the strong practicality of our work for global epidemic tracking.

Epidemic Event Correlation Table

We also show a geographical correlation for European countries as shown below. The blue circles indicate the number of extracted events using our framework.

Europe Graph

Zero-shot Multilingual Epidemic Prediction

To further demonstrate the strength of our framework's multilingual capabilities, we utilize SPEED++ framework for Chinese Weibo posts in a zero-shot way (no training on Chinese) for providing epidemic warnings for COVID-19, as shown below.

Epidemic Warning Benchmarking

The epidemic warnings indicated by the sharp increases in aggreagated events highlight the significance of our framework which could provide warnings as early as Dec 30 - 3 weeks before global infection tracking even began.

We further provide some qualitative posts and extracted events by our model below.

Event detection in Chinese social media text

Epidemic Information Aggregation

Finally, we develop an information aggregation system utilizing the argument extraction capability of our framework. Specifically, we aggregate and cluster extracted arguments across social media for each disease, argument, and language. We demonstrate some of the top relevant ones below.

Epidemic Event Correlation Table

Manual inspection shows the strong argument extraction capability of our framework. Such an information aggregation can be utilized for better epidemic preparedness through public attention shift monitoring as well as misinformation detection.

We further provide some qualitative posts for multilingual arguments extracted by our framework below.

Misinformation Detection

BibTeX


If you find our work inspirational or useful for your research, you can cite our works as below.

SPEED

@misc{parekh2024eventdetectionsocialmedia,
        title={Event Detection from Social Media for Epidemic Prediction}, 
        author={Tanmay Parekh and Anh Mac and Jiarui Yu and Yuxuan Dong and Syed Shahriar and Bonnie Liu and Eric Yang and Kuan-Hao Huang and Wei Wang and Nanyun Peng and Kai-Wei Chang},
        year={2024},
        eprint={2404.01679},
        archivePrefix={arXiv},
        primaryClass={cs.CL},
        url={https://arxiv.org/abs/2404.01679}, 
  }

SPEED++

@misc{parekh2024speedmultilingualeventextraction,
        title={SPEED++: A Multilingual Event Extraction Framework for Epidemic Prediction and Preparedness}, 
        author={Tanmay Parekh and Jeffrey Kwan and Jiarui Yu and Sparsh Johri and Hyosang Ahn and Sreya Muppalla and Kai-Wei Chang and Wei Wang and Nanyun Peng},
        year={2024},
        eprint={2410.18393},
        archivePrefix={arXiv},
        primaryClass={cs.CL},
        url={https://arxiv.org/abs/2410.18393}, 
  }