TU Darmstadt / ULB / TUbiblio

Rapid relevance classification of social media posts in disasters and emergencies: A system and evaluation featuring active, incremental and online learning

Kaufhold, Marc-André and Bayer, Markus and Reuter, Christian (2020):
Rapid relevance classification of social media posts in disasters and emergencies: A system and evaluation featuring active, incremental and online learning.
In: Information Processing & Management, 57 (1), Elsevier ScienceDirect, ISSN 0306-4573,
DOI: 10.1016/j.ipm.2019.102132,
[Article]

Abstract

The research field of crisis informatics examines, amongst others, the potentials and barriers of social media use during disasters and emergencies. Social media allow emergency services to receive valuable information (e.g., eyewitness reports, pictures, or videos) from social media. However, the vast amount of data generated during large-scale incidents can lead to issue of information overload. Research indicates that supervised machine learning techniques are sui- table for identifying relevant messages and filter out irrelevant messages, thus mitigating in- formation overload. Still, they require a considerable amount of labeled data, clear criteria for relevance classification, a usable interface to facilitate the labeling process and a mechanism to rapidly deploy retrained classifiers. To overcome these issues, we present (1) a system for social media monitoring, analysis and relevance classification, (2) abstract and precise criteria for re- levance classification in social media during disasters and emergencies, (3) the evaluation of a well-performing Random Forest algorithm for relevance classification incorporating metadata from social media into a batch learning approach (e.g., 91.28%/89.19% accuracy, 98.3%/89.6% precision and 80.4%/87.5% recall with a fast training time with feature subset selection on the European floods/BASF SE incident datasets), as well as (4) an approach and preliminary eva- luation for relevance classification including active, incremental and online learning to reduce the amount of required labeled data and to correct misclassifications of the algorithm by feed- back classification. Using the latter approach, we achieved a well-performing classifier based on the European floods dataset by only requiring a quarter of labeled data compared to the tradi- tional batch learning approach. Despite a lesser effect on the BASF SE incident dataset, still a substantial improvement could be determined.

Item Type: Article
Erschienen: 2020
Creators: Kaufhold, Marc-André and Bayer, Markus and Reuter, Christian
Title: Rapid relevance classification of social media posts in disasters and emergencies: A system and evaluation featuring active, incremental and online learning
Language: English
Abstract:

The research field of crisis informatics examines, amongst others, the potentials and barriers of social media use during disasters and emergencies. Social media allow emergency services to receive valuable information (e.g., eyewitness reports, pictures, or videos) from social media. However, the vast amount of data generated during large-scale incidents can lead to issue of information overload. Research indicates that supervised machine learning techniques are sui- table for identifying relevant messages and filter out irrelevant messages, thus mitigating in- formation overload. Still, they require a considerable amount of labeled data, clear criteria for relevance classification, a usable interface to facilitate the labeling process and a mechanism to rapidly deploy retrained classifiers. To overcome these issues, we present (1) a system for social media monitoring, analysis and relevance classification, (2) abstract and precise criteria for re- levance classification in social media during disasters and emergencies, (3) the evaluation of a well-performing Random Forest algorithm for relevance classification incorporating metadata from social media into a batch learning approach (e.g., 91.28%/89.19% accuracy, 98.3%/89.6% precision and 80.4%/87.5% recall with a fast training time with feature subset selection on the European floods/BASF SE incident datasets), as well as (4) an approach and preliminary eva- luation for relevance classification including active, incremental and online learning to reduce the amount of required labeled data and to correct misclassifications of the algorithm by feed- back classification. Using the latter approach, we achieved a well-performing classifier based on the European floods dataset by only requiring a quarter of labeled data compared to the tradi- tional batch learning approach. Despite a lesser effect on the BASF SE incident dataset, still a substantial improvement could be determined.

Journal or Publication Title: Information Processing & Management
Journal volume: 57
Number: 1
Publisher: Elsevier ScienceDirect
Uncontrolled Keywords: A-Paper, CORE-A, Crisis, SecUrban, SocialMedia,WKWI-B, emergenCITY, emergenCITY_INF, emergenCITY_SG
Divisions: 20 Department of Computer Science
20 Department of Computer Science > Science and Technology for Peace and Security (PEASEC)
Profile Areas
Profile Areas > Cybersecurity (CYSEC)
LOEWE
LOEWE > LOEWE-Zentren
LOEWE > LOEWE-Zentren > CRISP - Center for Research in Security and Privacy
LOEWE > LOEWE-Zentren > emergenCITY
Zentrale Einrichtungen
Zentrale Einrichtungen > Interdisziplinäre Arbeitsgruppe Naturwissenschaft, Technik und Sicherheit (IANUS)
Date Deposited: 20 Aug 2020 07:32
DOI: 10.1016/j.ipm.2019.102132
Export:
Suche nach Titel in: TUfind oder in Google
Send an inquiry Send an inquiry

Options (only for editors)
Show editorial Details Show editorial Details