Abstract
In a user-generated text such as on social media platforms and online forums, people often use disease or symptom terms in ways other than to describe their health. In data-driven public health surveillance, the health mention classification (HMC) task aims to identify posts where users are discussing health conditions rather than using disease and symptom terms for other reasons. Existing computational research typically only studies health mentions in Twitter, with limited coverage of disease or symptom terms, ignore user behavior information, and other ways people use disease or symptom terms. To advance the HMC research, we present a Reddit health mention dataset (RHMD), a new dataset of multi-domain Reddit data for the HMC. RHMD consists of 10,015 manually labeled Reddit posts that mention 15 common disease or symptom terms and are annotated with four labels: namely personal health mentions, non-personal health mentions, figurative health mentions, and hyperbolic health mentions. With RHMD, we propose HMCNET that combines a target keyword (disease or symptom term) identification and user behavior hierarchically to improve HMC. Experimental results demonstrate that the proposed approach outperforms state-of-the-art methods with an F1-Score of 0.75 (an increase of 11% over the state-of-the-art) and shows that our new dataset poses a strong challenge to the existing HMC methods.
Original language | English |
---|---|
Title of host publication | WWW '22 |
Subtitle of host publication | proceedings of the ACM Web Conference 2022 |
Place of Publication | New York |
Publisher | Association for Computing Machinery |
Pages | 2573-2581 |
Number of pages | 9 |
ISBN (Electronic) | 9781450390965 |
DOIs | |
Publication status | Published - 2022 |
Externally published | Yes |
Event | 31st ACM World Wide Web Conference, WWW 2022 - Virtual, Online, France Duration: 25 Apr 2022 → 29 Apr 2022 |
Conference
Conference | 31st ACM World Wide Web Conference, WWW 2022 |
---|---|
Country/Territory | France |
City | Virtual, Online |
Period | 25/04/22 → 29/04/22 |
Keywords
- Health Mention Classification
- Public Health Surveillance