Language identification using classifier ensembles

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

Abstract

In this paper we describe the language identification system we developed for the Discriminating Similar Languages (DSL) 2015 shared task. We constructed a classifier ensemble composed of several Support Vector Machine (SVM) base classi- fiers, each trained on a single feature type. Our feature types include character 1–6 grams and word unigrams and bigrams. Using this system we were able to outperform the other entries in the closed training track of the DSL 2015 shared task, achieving the best accuracy of 95.54%.
Original languageEnglish
Title of host publicationJoint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects
Subtitle of host publicationproceedings of the workshop
Place of PublicationMelbourne, Australia
PublisherAssociation for Computational Linguistics
Pages35-43
Number of pages9
ISBN (Print)9789544520311
Publication statusPublished - 2015
EventJoint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects - Hissar, Bulgaria
Duration: 10 Sept 201510 Sept 2015

Workshop

WorkshopJoint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects
CityHissar, Bulgaria
Period10/09/1510/09/15

Fingerprint

Dive into the research topics of 'Language identification using classifier ensembles'. Together they form a unique fingerprint.

Cite this