TY - JOUR
T1 - A Corpus for mining drug-related knowledge from Twitter chatter
T2 - language models and their utilities
AU - Sarker, Abeed
AU - Gonzalez, Graciela
N1 - Copyright the Author(s) 2016. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.
PY - 2017/2/1
Y1 - 2017/2/1
N2 - In this data article, we present to the data science, natural language processing and public heath communities an unlabeled corpus and a set of language models. We collected the data from Twitter using drug names as keywords, including their common misspelled forms. Using this data, which is rich in drug-related chatter, we developed language models to aid the development of data mining tools and methods in this domain. We generated several models that capture (i) distributed word representations and (ii) probabilities of n-gram sequences. The data set we are releasing consists of 267,215 Twitter posts made during the four-month period—November, 2014 to February, 2015. The posts mention over 250 drug-related keywords. The language models encapsulate semantic and sequential properties of the texts.
AB - In this data article, we present to the data science, natural language processing and public heath communities an unlabeled corpus and a set of language models. We collected the data from Twitter using drug names as keywords, including their common misspelled forms. Using this data, which is rich in drug-related chatter, we developed language models to aid the development of data mining tools and methods in this domain. We generated several models that capture (i) distributed word representations and (ii) probabilities of n-gram sequences. The data set we are releasing consists of 267,215 Twitter posts made during the four-month period—November, 2014 to February, 2015. The posts mention over 250 drug-related keywords. The language models encapsulate semantic and sequential properties of the texts.
UR - http://www.scopus.com/inward/record.url?scp=85017252409&partnerID=8YFLogxK
U2 - 10.1016/j.dib.2016.11.056
DO - 10.1016/j.dib.2016.11.056
M3 - Article
C2 - 27981203
AN - SCOPUS:85017252409
VL - 10
SP - 122
EP - 131
JO - Data in Brief
JF - Data in Brief
SN - 2352-3409
ER -