Automatically appraising the credibility of vaccine-related web pages shared on social media: a Twitter surveillance study

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Background: Tools used to appraise the credibility of health information are time-consuming to apply and require context-specific expertise, limiting their use for quickly identifying and mitigating the spread of misinformation as it emerges. Objective: The aim of this study was to estimate the proportion of vaccine-related Twitter posts linked to Web pages of low credibility and measure the potential reach of those posts. Methods: Sampling from 143,003 unique vaccine-related Web pages shared on Twitter between January 2017 and March 2018, we used a 7-point checklist adapted from validated tools and guidelines to manually appraise the credibility of 474 Web pages. These were used to train several classifiers (random forests, support vector machines, and recurrent neural networks) using the text from a Web page to predict whether the information satisfies each of the 7 criteria. Estimating the credibility of all other Web pages, we used the follower network to estimate potential exposures relative to a credibility score defined by the 7-point checklist. Results: The best-performing classifiers were able to distinguish between low, medium, and high credibility with an accuracy of 78% and labeled low-credibility Web pages with a precision of over 96%. Across the set of unique Web pages, 11.86% (16,961 of 143,003) were estimated as low credibility and they generated 9.34% (1.64 billion of 17.6 billion) of potential exposures. The 100 most popular links to low credibility Web pages were each potentially seen by an estimated 2 million to 80 million Twitter users globally. Conclusions: The results indicate that although a small minority of low-credibility Web pages reach a large audience, low-credibility Web pages tend to reach fewer users than other Web pages overall and are more commonly shared within certain subpopulations. An automatic credibility appraisal tool may be useful for finding communities of users at higher risk of exposure to low-credibility vaccine communications.

LanguageEnglish
Article numbere14007
Pages1-14
Number of pages14
JournalJournal of Medical Internet Research
Volume21
Issue number11
DOIs
Publication statusPublished - 4 Nov 2019

Fingerprint

Social Media
Vaccines
Checklist
Communication
Guidelines
Health

Bibliographical note

Copyright the Author(s) 2019. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.

Keywords

  • credibility appraisal
  • health misinformation
  • machine learning
  • social media

Cite this

@article{032be028fdf9494c8bcaf2bf943357ee,
title = "Automatically appraising the credibility of vaccine-related web pages shared on social media: a Twitter surveillance study",
abstract = "Background: Tools used to appraise the credibility of health information are time-consuming to apply and require context-specific expertise, limiting their use for quickly identifying and mitigating the spread of misinformation as it emerges. Objective: The aim of this study was to estimate the proportion of vaccine-related Twitter posts linked to Web pages of low credibility and measure the potential reach of those posts. Methods: Sampling from 143,003 unique vaccine-related Web pages shared on Twitter between January 2017 and March 2018, we used a 7-point checklist adapted from validated tools and guidelines to manually appraise the credibility of 474 Web pages. These were used to train several classifiers (random forests, support vector machines, and recurrent neural networks) using the text from a Web page to predict whether the information satisfies each of the 7 criteria. Estimating the credibility of all other Web pages, we used the follower network to estimate potential exposures relative to a credibility score defined by the 7-point checklist. Results: The best-performing classifiers were able to distinguish between low, medium, and high credibility with an accuracy of 78{\%} and labeled low-credibility Web pages with a precision of over 96{\%}. Across the set of unique Web pages, 11.86{\%} (16,961 of 143,003) were estimated as low credibility and they generated 9.34{\%} (1.64 billion of 17.6 billion) of potential exposures. The 100 most popular links to low credibility Web pages were each potentially seen by an estimated 2 million to 80 million Twitter users globally. Conclusions: The results indicate that although a small minority of low-credibility Web pages reach a large audience, low-credibility Web pages tend to reach fewer users than other Web pages overall and are more commonly shared within certain subpopulations. An automatic credibility appraisal tool may be useful for finding communities of users at higher risk of exposure to low-credibility vaccine communications.",
keywords = "credibility appraisal, health misinformation, machine learning, social media",
author = "Zubair Shah and Didi Surian and Amalie Dyda and Enrico Coiera and Mandl, {Kenneth D.} and Dunn, {Adam G.}",
note = "Copyright the Author(s) 2019. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.",
year = "2019",
month = "11",
day = "4",
doi = "10.2196/14007",
language = "English",
volume = "21",
pages = "1--14",
journal = "Journal of Medical Internet Research",
issn = "1438-8871",
publisher = "JMIR PUBLICATIONS, INC",
number = "11",

}

Automatically appraising the credibility of vaccine-related web pages shared on social media : a Twitter surveillance study. / Shah, Zubair; Surian, Didi; Dyda, Amalie; Coiera, Enrico; Mandl, Kenneth D.; Dunn, Adam G.

In: Journal of Medical Internet Research, Vol. 21, No. 11, e14007, 04.11.2019, p. 1-14.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Automatically appraising the credibility of vaccine-related web pages shared on social media

T2 - Journal of Medical Internet Research

AU - Shah, Zubair

AU - Surian, Didi

AU - Dyda, Amalie

AU - Coiera, Enrico

AU - Mandl, Kenneth D.

AU - Dunn, Adam G.

N1 - Copyright the Author(s) 2019. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.

PY - 2019/11/4

Y1 - 2019/11/4

N2 - Background: Tools used to appraise the credibility of health information are time-consuming to apply and require context-specific expertise, limiting their use for quickly identifying and mitigating the spread of misinformation as it emerges. Objective: The aim of this study was to estimate the proportion of vaccine-related Twitter posts linked to Web pages of low credibility and measure the potential reach of those posts. Methods: Sampling from 143,003 unique vaccine-related Web pages shared on Twitter between January 2017 and March 2018, we used a 7-point checklist adapted from validated tools and guidelines to manually appraise the credibility of 474 Web pages. These were used to train several classifiers (random forests, support vector machines, and recurrent neural networks) using the text from a Web page to predict whether the information satisfies each of the 7 criteria. Estimating the credibility of all other Web pages, we used the follower network to estimate potential exposures relative to a credibility score defined by the 7-point checklist. Results: The best-performing classifiers were able to distinguish between low, medium, and high credibility with an accuracy of 78% and labeled low-credibility Web pages with a precision of over 96%. Across the set of unique Web pages, 11.86% (16,961 of 143,003) were estimated as low credibility and they generated 9.34% (1.64 billion of 17.6 billion) of potential exposures. The 100 most popular links to low credibility Web pages were each potentially seen by an estimated 2 million to 80 million Twitter users globally. Conclusions: The results indicate that although a small minority of low-credibility Web pages reach a large audience, low-credibility Web pages tend to reach fewer users than other Web pages overall and are more commonly shared within certain subpopulations. An automatic credibility appraisal tool may be useful for finding communities of users at higher risk of exposure to low-credibility vaccine communications.

AB - Background: Tools used to appraise the credibility of health information are time-consuming to apply and require context-specific expertise, limiting their use for quickly identifying and mitigating the spread of misinformation as it emerges. Objective: The aim of this study was to estimate the proportion of vaccine-related Twitter posts linked to Web pages of low credibility and measure the potential reach of those posts. Methods: Sampling from 143,003 unique vaccine-related Web pages shared on Twitter between January 2017 and March 2018, we used a 7-point checklist adapted from validated tools and guidelines to manually appraise the credibility of 474 Web pages. These were used to train several classifiers (random forests, support vector machines, and recurrent neural networks) using the text from a Web page to predict whether the information satisfies each of the 7 criteria. Estimating the credibility of all other Web pages, we used the follower network to estimate potential exposures relative to a credibility score defined by the 7-point checklist. Results: The best-performing classifiers were able to distinguish between low, medium, and high credibility with an accuracy of 78% and labeled low-credibility Web pages with a precision of over 96%. Across the set of unique Web pages, 11.86% (16,961 of 143,003) were estimated as low credibility and they generated 9.34% (1.64 billion of 17.6 billion) of potential exposures. The 100 most popular links to low credibility Web pages were each potentially seen by an estimated 2 million to 80 million Twitter users globally. Conclusions: The results indicate that although a small minority of low-credibility Web pages reach a large audience, low-credibility Web pages tend to reach fewer users than other Web pages overall and are more commonly shared within certain subpopulations. An automatic credibility appraisal tool may be useful for finding communities of users at higher risk of exposure to low-credibility vaccine communications.

KW - credibility appraisal

KW - health misinformation

KW - machine learning

KW - social media

UR - http://www.scopus.com/inward/record.url?scp=85074544598&partnerID=8YFLogxK

U2 - 10.2196/14007

DO - 10.2196/14007

M3 - Article

VL - 21

SP - 1

EP - 14

JO - Journal of Medical Internet Research

JF - Journal of Medical Internet Research

SN - 1438-8871

IS - 11

M1 - e14007

ER -