From archive to corpus: Transcription and annotation in the creation of signed language corpora

Trevor Johnston*

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    121 Citations (Scopus)

    Abstract

    Annotations are an important resource in corpus-based linguistic research. In fact, the most important feature of a modern signed language corpus should be that it has been annotated rather than simply transcribed. Digital multi-media annotation software can now transform language recordings into machinereadable texts using gloss-based annotations without it first being necessary to transcribe these utterances, provided that sign tokens are identified and discriminated according to type. Further annotations can subsequently be appended to these units. However, unique identifiers of sign types (or 'ID-glosses') can only be used if a comprehensive reference lexical database of the language already exists. In order to create a basic multi-purpose reference signed language corpus, therefore, linguists should prioritize annotation using ID-glosses above transcription. The effort expended in creating a transcription that does not facilitate the unique identification of sign types will not result in a machine-readable corpus in any meaningful sense, contrary to expectations.

    Original languageEnglish
    Pages (from-to)106-131
    Number of pages26
    JournalInternational Journal of Corpus Linguistics
    Volume15
    Issue number1
    DOIs
    Publication statusPublished - 22 Mar 2010

    Fingerprint

    Dive into the research topics of 'From archive to corpus: Transcription and annotation in the creation of signed language corpora'. Together they form a unique fingerprint.

    Cite this