Abstract
Social media users use disease and symptoms words in different ways, including describing their personal health experiences figuratively or in other general discussions. The health mention classification (HMC) task aims to separate how people use terms, which is important in public health applications. Existing HMC studies address this problem using pretrained language models (PLMs). However, the remaining gaps in the area include the need for linguistic grounding, the requirement for large volumes of labelled data, and that solutions are often only tested on Twitter or Reddit, which provides limited evidence of the transportability of models. To address these gaps, we propose a novel method that uses a transformer-based PLM to obtain a contextual representation of target (disease or symptom) terms coupled with a contrastive loss to establish a larger gap between target terms' literal and figurative uses using linguistic theories. We introduce the use of a simple and effective approach for harvesting candidate instances from the broad corpus and generalising the proposed method using self-training to address the label scarcity challenge. Our experiments on publicly available health-mention datasets from Twitter (HMC2019) and Reddit (RHMD) demonstrate that our method outperforms the state-of-the-art HMC methods on both datasets for the HMC task. We further analyse the transferability and generalisability of our method and conclude with a discussion on the empirical and ethical considerations of our study.
Original language | English |
---|---|
Title of host publication | WSDM '24 |
Subtitle of host publication | proceedings of the 17th ACM International Conference on Web Search and Data Mining |
Place of Publication | New York |
Publisher | Association for Computing Machinery |
Pages | 529-537 |
Number of pages | 9 |
ISBN (Electronic) | 9798400703713 |
DOIs | |
Publication status | Published - 2024 |
Externally published | Yes |
Event | ACM International Conference on Web Search and Data Mining (17th : 2024) - Merida, Mexico Duration: 4 Mar 2024 → 8 Mar 2024 Conference number: 17th |
Conference
Conference | ACM International Conference on Web Search and Data Mining (17th : 2024) |
---|---|
Abbreviated title | WSDM '24 |
Country/Territory | Mexico |
City | Merida |
Period | 4/03/24 → 8/03/24 |