Adding value to, and extracting of value from, a signed language corpus through secondary processing: implications for annotation schemas and corpus creation

Trevor Johnston

    Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

    Abstract

    A basic signed language (SL) corpus is created through primary processing of video recordings using multi--‐media annotation software. Primary processing entails the tokenization and identification of SL units. For the purposes of linguistic research a corpus also needs secondary processing. Secondary processing entails appending tags for specific linguistic features to primary annotations. I draw on the experience from the Auslan corpus project to describe how primary and secondary processing can be used in corpus- based SL research. In particular, I show how the tier structure of ELAN can be used to tag SL units in a variety of ways, and how this information can be used to glean new information from the corpus which can then be added as new annotations to the corpus. Value-adding by principled and systematic primary and secondary processing of digital recordings is thus not only essential for corpus creation (‘machine-readability’), it also enables further enriching of the corpus so that even more value can be extracted. I conclude by discussing the implications for annotation software and standardized annotation schemas used in the creation of SL corpora.
    Original languageEnglish
    Title of host publicationSeventh International Conference on Language Resources and Evaluation (LREC 2010)
    Subtitle of host publicationproceedings
    EditorsNicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
    Place of PublicationLuxemburg
    PublisherEuropean Language Resources Association
    Pages137-142
    Number of pages6
    ISBN (Print)9782951740860
    Publication statusPublished - 2010
    EventInternational Conference on Language Resources and Evaluation (7th : 2010) - Valletta, Malta
    Duration: 17 May 201023 May 2010

    Conference

    ConferenceInternational Conference on Language Resources and Evaluation (7th : 2010)
    CityValletta, Malta
    Period17/05/1023/05/10

    Fingerprint Dive into the research topics of 'Adding value to, and extracting of value from, a signed language corpus through secondary processing: implications for annotation schemas and corpus creation'. Together they form a unique fingerprint.

    Cite this