TY - GEN
T1 - Measuring and analyzing search engine poisoning of linguistic collisions
AU - Joslin, Matthew
AU - Li, Neng
AU - Hao, Shuang
AU - Xue, Minhui
AU - Zhu, Haojin
PY - 2019
Y1 - 2019
N2 - Misspelled keywords have become an appealing target in search poisoning, since they are less competitive to promote than the correct queries and account for a considerable amount of search traffic. Search engines have adopted several countermeasure strategies, e.g., Google applies automated corrections on queried keywords and returns search results of the corrected versions directly. However, a sophisticated class of attack, which we term as linguistic-collision misspelling, can evade auto-correction and poison search results. Cybercriminals target special queries where the misspelled terms are existent words, even in other languages (e.g., 'idobe', a misspelling of the English word 'adobe', is a legitimate word in the Nigerian language). In this paper, we perform the first large-scale analysis on linguistic-collision search poisoning attacks. In particular, we check 1.77 million misspelled search terms on Google and Baidu and analyze both English and Chinese languages, which are the top two languages used by Internet users. We leverage edit distance operations and linguistic properties to generate misspelling candidates. To more efficiently identify linguistic-collision search terms, we design a deep learning model that can improve collection rate by 2.84x compared to random sampling. Our results show that the abuse is prevalent: around 1.19% of linguistic-collision search terms on Google and Baidu have results on the first page directing to malicious websites. We also find that cybercriminals mainly target categories of gambling, drugs, and adult content. Mobile-device users disproportionately search for misspelled keywords, presumably due to small screen for input. Our work highlights this new class of search engine poisoning and provides insights to help mitigate the threat.
AB - Misspelled keywords have become an appealing target in search poisoning, since they are less competitive to promote than the correct queries and account for a considerable amount of search traffic. Search engines have adopted several countermeasure strategies, e.g., Google applies automated corrections on queried keywords and returns search results of the corrected versions directly. However, a sophisticated class of attack, which we term as linguistic-collision misspelling, can evade auto-correction and poison search results. Cybercriminals target special queries where the misspelled terms are existent words, even in other languages (e.g., 'idobe', a misspelling of the English word 'adobe', is a legitimate word in the Nigerian language). In this paper, we perform the first large-scale analysis on linguistic-collision search poisoning attacks. In particular, we check 1.77 million misspelled search terms on Google and Baidu and analyze both English and Chinese languages, which are the top two languages used by Internet users. We leverage edit distance operations and linguistic properties to generate misspelling candidates. To more efficiently identify linguistic-collision search terms, we design a deep learning model that can improve collection rate by 2.84x compared to random sampling. Our results show that the abuse is prevalent: around 1.19% of linguistic-collision search terms on Google and Baidu have results on the first page directing to malicious websites. We also find that cybercriminals mainly target categories of gambling, drugs, and adult content. Mobile-device users disproportionately search for misspelled keywords, presumably due to small screen for input. Our work highlights this new class of search engine poisoning and provides insights to help mitigate the threat.
KW - Linguistic-collision-misspelling
KW - Pinyin
KW - Recurrent-neural-network
KW - Search-engine-poisoning
UR - http://www.scopus.com/inward/record.url?scp=85072923160&partnerID=8YFLogxK
U2 - 10.1109/SP.2019.00025
DO - 10.1109/SP.2019.00025
M3 - Conference proceeding contribution
AN - SCOPUS:85072923160
T3 - Proceedings - IEEE Symposium on Security and Privacy
SP - 1311
EP - 1325
BT - Proceedings - 2019 IEEE Symposium on Security and Privacy, SP 2019
PB - Institute of Electrical and Electronics Engineers (IEEE)
CY - Piscataway, NJ
T2 - 40th IEEE Symposium on Security and Privacy, SP 2019
Y2 - 19 May 2019 through 23 May 2019
ER -