The Jinan Chinese Learner Corpus

Maolin Wang, Shervin Malmasi, Mingxuan Huang

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review


We present the Jinan Chinese Learner Corpus, a large collection of L2 Chinese texts produced by learners that can be used for educational tasks. The present work introduces the data and provides a detailed description. Currently, the corpus contains approximately 6 million Chinese characters written by students from over 50 different L1 backgrounds. This is a large-scale corpus of learner Chinese texts which is freely available to researchers either through a web interface or as a set of raw texts. The data can be used in NLP tasks including automatic essay grading, language transfer analysis and error detection and correction. It can also be used in applied and corpus linguistics to support Second Language Acquisition (SLA) research and the development of pedagogical resources. Practical applications of the data and future directions are discussed.
Original languageEnglish
Title of host publicationThe tenth workshop on innovative use of NLP for building educational applications
Subtitle of host publicationproceedings of the workshop
Place of PublicationUnited States
PublisherAssociation for Computational Linguistics
Number of pages6
ISBN (Print)9781941643358
Publication statusPublished - 2015
EventWorkshop on Innovative Use of NLP for Building Educational Applications (10th : 2015) - Denver, CO
Duration: 4 Jun 20154 Jun 2015


WorkshopWorkshop on Innovative Use of NLP for Building Educational Applications (10th : 2015)
CityDenver, CO

Fingerprint Dive into the research topics of 'The Jinan Chinese Learner Corpus'. Together they form a unique fingerprint.

Cite this