Discriminating similar languages: evaluations and explorations

Cyril Goutte, Serge Léger, Shervin Malmasi, Marcos Zampieri

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

29 Citations (Scopus)
19 Downloads (Pure)


We present an analysis of the performance of machine learning classifiers on discriminating between similar languages and language varieties. We carried out a number of experiments using the results of the two editions of the Discriminating between Similar Languages (DSL) shared task. We investigate the progress made between the two tasks, estimate an upper bound on possible performance using ensemble and oracle combination, and provide learning curves to help us understand which languages are more challenging. A number of difficult sentences are identified and investigated further with human annotation.

Original languageEnglish
Title of host publicationProceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016
EditorsNicoletta Calzolari, Khalid Choukri, Helene Mazo, Asuncion Moreno, Thierry Declerck, Sara Goggi, Marko Grobelnik, Jan Odijk, Stelios Piperidis, Bente Maegaard, Joseph Mariani
PublisherEuropean Language Resources Association (ELRA)
Number of pages8
ISBN (Electronic)9782951740891
Publication statusPublished - 1 Jan 2016
Event10th International Conference on Language Resources and Evaluation, LREC 2016 - Portoroz, Slovenia
Duration: 23 May 201628 May 2016


Conference10th International Conference on Language Resources and Evaluation, LREC 2016

Bibliographical note

Copyright the European Language Resources Association. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.


  • Evaluation
  • Language identification
  • Language varieties


Dive into the research topics of 'Discriminating similar languages: evaluations and explorations'. Together they form a unique fingerprint.

Cite this