Abstract
We present the Jinan Chinese Learner Corpus, a large collection of L2 Chinese texts produced by learners that can be used for educational tasks. The present work introduces the data and provides a detailed description. Currently, the corpus contains approximately 6 million Chinese characters written by students from over 50 different L1 backgrounds. This is a large-scale corpus of learner Chinese texts which is freely available to researchers either through a web interface or as a set of raw texts. The data can be used in NLP tasks including automatic essay grading, language transfer analysis and error detection and correction. It can also be used in applied and corpus linguistics to support Second Language Acquisition (SLA) research and the development of pedagogical resources. Practical applications of the data and future directions are discussed.
| Original language | English |
|---|---|
| Title of host publication | The tenth workshop on innovative use of NLP for building educational applications |
| Subtitle of host publication | proceedings of the workshop |
| Place of Publication | United States |
| Publisher | Association for Computational Linguistics |
| Pages | 118-123 |
| Number of pages | 6 |
| ISBN (Print) | 9781941643358 |
| Publication status | Published - 2015 |
| Event | Workshop on Innovative Use of NLP for Building Educational Applications (10th : 2015) - Denver, CO Duration: 4 Jun 2015 → 4 Jun 2015 |
Workshop
| Workshop | Workshop on Innovative Use of NLP for Building Educational Applications (10th : 2015) |
|---|---|
| City | Denver, CO |
| Period | 4/06/15 → 4/06/15 |