TY - JOUR
T1 - From archive to corpus
T2 - Transcription and annotation in the creation of signed language corpora
AU - Johnston, Trevor
PY - 2010/3/22
Y1 - 2010/3/22
N2 - Annotations are an important resource in corpus-based linguistic research. In fact, the most important feature of a modern signed language corpus should be that it has been annotated rather than simply transcribed. Digital multi-media annotation software can now transform language recordings into machinereadable texts using gloss-based annotations without it first being necessary to transcribe these utterances, provided that sign tokens are identified and discriminated according to type. Further annotations can subsequently be appended to these units. However, unique identifiers of sign types (or 'ID-glosses') can only be used if a comprehensive reference lexical database of the language already exists. In order to create a basic multi-purpose reference signed language corpus, therefore, linguists should prioritize annotation using ID-glosses above transcription. The effort expended in creating a transcription that does not facilitate the unique identification of sign types will not result in a machine-readable corpus in any meaningful sense, contrary to expectations.
AB - Annotations are an important resource in corpus-based linguistic research. In fact, the most important feature of a modern signed language corpus should be that it has been annotated rather than simply transcribed. Digital multi-media annotation software can now transform language recordings into machinereadable texts using gloss-based annotations without it first being necessary to transcribe these utterances, provided that sign tokens are identified and discriminated according to type. Further annotations can subsequently be appended to these units. However, unique identifiers of sign types (or 'ID-glosses') can only be used if a comprehensive reference lexical database of the language already exists. In order to create a basic multi-purpose reference signed language corpus, therefore, linguists should prioritize annotation using ID-glosses above transcription. The effort expended in creating a transcription that does not facilitate the unique identification of sign types will not result in a machine-readable corpus in any meaningful sense, contrary to expectations.
UR - http://www.scopus.com/inward/record.url?scp=77950171960&partnerID=8YFLogxK
U2 - 10.1075/ijcl.15.1.05joh
DO - 10.1075/ijcl.15.1.05joh
M3 - Article
AN - SCOPUS:77950171960
SN - 1384-6655
VL - 15
SP - 106
EP - 131
JO - International Journal of Corpus Linguistics
JF - International Journal of Corpus Linguistics
IS - 1
ER -