Attempts to profile authors based on their characteristics, including native language, have drawn attention in recent years, via several approaches using machine learning with simple features. In this paper we investigate the potential usefulness to this task of contrastive analysis from second language acquistion research, which postulates that the (syntactic) errors in a text are influenced by an author’s native language. We explore this, first, by conducting an analysis of three syntactic error types, through hypothesis testing and machine learning; and second, through adding in these errors as features to the replication of a previous machine learning approach. This preliminary study provides some support for the use of this kind of syntactic errors as a clue to identifying the native language of an author.
|Number of pages||9|
|Journal||Australasian Language Technology Association Workshop : Proceedings of the Workshop|
|Publication status||Published - 2009|
|Event||Australasian Language Technology Association Workshop - Sydney|
Duration: 3 Dec 2009 → 4 Dec 2009