A linguistic grounding-infused contrastive learning approach for health mention classification on social media

Usman Naseem, Jinmaan Kim, Matloob Khush, Adam G. Dunn

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

Abstract

Social media users use disease and symptoms words in different ways, including describing their personal health experiences figuratively or in other general discussions. The health mention classification (HMC) task aims to separate how people use terms, which is important in public health applications. Existing HMC studies address this problem using pretrained language models (PLMs). However, the remaining gaps in the area include the need for linguistic grounding, the requirement for large volumes of labelled data, and that solutions are often only tested on Twitter or Reddit, which provides limited evidence of the transportability of models. To address these gaps, we propose a novel method that uses a transformer-based PLM to obtain a contextual representation of target (disease or symptom) terms coupled with a contrastive loss to establish a larger gap between target terms' literal and figurative uses using linguistic theories. We introduce the use of a simple and effective approach for harvesting candidate instances from the broad corpus and generalising the proposed method using self-training to address the label scarcity challenge. Our experiments on publicly available health-mention datasets from Twitter (HMC2019) and Reddit (RHMD) demonstrate that our method outperforms the state-of-the-art HMC methods on both datasets for the HMC task. We further analyse the transferability and generalisability of our method and conclude with a discussion on the empirical and ethical considerations of our study.
Original languageEnglish
Title of host publicationWSDM '24
Subtitle of host publicationproceedings of the 17th ACM International Conference on Web Search and Data Mining
Place of PublicationNew York
PublisherAssociation for Computing Machinery
Pages529-537
Number of pages9
ISBN (Electronic)9798400703713
DOIs
Publication statusPublished - 2024
Externally publishedYes
EventACM International Conference on Web Search and Data Mining (17th : 2024) - Merida, Mexico
Duration: 4 Mar 20248 Mar 2024
Conference number: 17th

Conference

ConferenceACM International Conference on Web Search and Data Mining (17th : 2024)
Abbreviated titleWSDM '24
Country/TerritoryMexico
CityMerida
Period4/03/248/03/24

Fingerprint

Dive into the research topics of 'A linguistic grounding-infused contrastive learning approach for health mention classification on social media'. Together they form a unique fingerprint.

Cite this