MASK-Net: robust health mention classification by masking a disease or symptom terms

Usman Naseem*, Surendrabikram Thapa, Qi Zhang, Junaid Rashid, Liang Hu

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Social media users often use disease or symptom terms in ways other than describing their health conditions, which can lead to flawed conclusions in data-driven public health surveillance. The health mention classification (HMC) task aims to identify posts in which users use disease or symptom terms to discuss their health conditions instead of using them for other reasons. Existing methods rely on features extracted from external resources and are tested on data from either Twitter or Reddit; therefore, their generalizability and transferability are unproven. In this work, we present MASK-Net, which masks disease or symptom terms and relies on the context of a post. Furthermore, to capture the negative sentiments associated with the experience of having a disease, we incorporate sentiment information to improve the HMC. We conduct experiments using publicly available health-mention datasets collected from Twitter and Reddit. Experimental results demonstrate that our method outperforms state-of-The-Art methods on both HMC datasets, highlighting the relevance of context words in identifying HMC. Additionally, we evaluate our method in cross-domain and multidomain settings to analyze the transferability and generalizability of MASK-Net and conclude with a discussion on the empirical and ethical considerations of our study.

Original languageEnglish
Number of pages10
JournalIEEE Transactions on Computational Social Systems
Early online date2 Dec 2024
DOIs
Publication statusE-pub ahead of print - 2 Dec 2024

Keywords

  • Health mention classification (HMC)
  • mental health
  • social media
  • word masking

Cite this