TY - JOUR
T1 - Using social connection information to improve opinion mining
T2 - 15th World Congress on Health and Biomedical Informatics, MEDINFO 2015
AU - Zhou, Xujuan
AU - Coiera, Enrico
AU - Tsafnat, Guy
AU - Arachi, Diana
AU - Ong, Mei Sing
AU - Dunn, Adam G.
N1 - Copyright the Publisher 2015. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.
PY - 2015
Y1 - 2015
N2 - The manner in which people preferentially interact with others like themselves suggests that information about social connections may be useful in the surveillance of opinions for public health purposes. We examined if social connection information from tweets about human papillomavirus (HPV) vaccines could be used to train classifiers that identify anti-vaccine opinions. From 42,533 tweets posted between October 2013 and March 2014, 2,098 were sampled at random and two investigators independently identified anti-vaccine opinions. Machine learning methods were used to train classifiers using the first three months of data, including content (8,261 text fragments) and social connections (10,758 relationships). Connection-based classifiers performed similarly to content-based classifiers on the first three months of training data, and performed more consistently than content-based classifiers on test data from the subsequent three months. The most accurate classifier achieved an accuracy of 88.6% on the test data set, and used only social connection features. Information about how people are connected, rather than what they write, may be useful for improving public health surveillance methods on Twitter.
AB - The manner in which people preferentially interact with others like themselves suggests that information about social connections may be useful in the surveillance of opinions for public health purposes. We examined if social connection information from tweets about human papillomavirus (HPV) vaccines could be used to train classifiers that identify anti-vaccine opinions. From 42,533 tweets posted between October 2013 and March 2014, 2,098 were sampled at random and two investigators independently identified anti-vaccine opinions. Machine learning methods were used to train classifiers using the first three months of data, including content (8,261 text fragments) and social connections (10,758 relationships). Connection-based classifiers performed similarly to content-based classifiers on the first three months of training data, and performed more consistently than content-based classifiers on test data from the subsequent three months. The most accurate classifier achieved an accuracy of 88.6% on the test data set, and used only social connection features. Information about how people are connected, rather than what they write, may be useful for improving public health surveillance methods on Twitter.
UR - http://www.scopus.com/inward/record.url?scp=84952034317&partnerID=8YFLogxK
U2 - 10.3233/978-1-61499-564-7-761
DO - 10.3233/978-1-61499-564-7-761
M3 - Conference paper
C2 - 26262154
AN - SCOPUS:84952034317
SN - 0926-9630
VL - 216
SP - 761
EP - 765
JO - Studies in Health Technology and Informatics
JF - Studies in Health Technology and Informatics
Y2 - 19 August 2015 through 23 August 2015
ER -