Abstract
In this paper we describe the language identification system we developed for the Discriminating Similar Languages (DSL) 2015 shared task. We constructed a classifier ensemble composed of several Support Vector Machine (SVM) base classi- fiers, each trained on a single feature type. Our feature types include character 1–6 grams and word unigrams and bigrams. Using this system we were able to outperform the other entries in the closed training track of the DSL 2015 shared task, achieving the best accuracy of 95.54%.
Original language | English |
---|---|
Title of host publication | Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects |
Subtitle of host publication | proceedings of the workshop |
Place of Publication | Melbourne, Australia |
Publisher | Association for Computational Linguistics |
Pages | 35-43 |
Number of pages | 9 |
ISBN (Print) | 9789544520311 |
Publication status | Published - 2015 |
Event | Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects - Hissar, Bulgaria Duration: 10 Sept 2015 → 10 Sept 2015 |
Workshop
Workshop | Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects |
---|---|
City | Hissar, Bulgaria |
Period | 10/09/15 → 10/09/15 |