Case study: the AusTalk corpus

Stephen Cassidy, Dominique Estival, Felicity Cox

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

This chapter presents detail of the Annotation Task of the Big Australian Speech Corpus (Big ASC) project, in which AusTalk, a large audio-visual corpus of Australian English, was collected. We describe the scope of the task and its implementation and give an overview of the results so far. When complete, AusTalk will consist of 3 h of audio-visual recording from each of 1000 speakers of Australian English, across a wide range of tasks including scripted (read) speech, spontaneous speech and dialogue. The read speech of 100 participants has now been manually annotated but a challenge of the project was to produce transcriptions for the unscripted (spontaneous) speech data. We report on several avenues that have been explored for the automation of this task. We describe the annotation challenges, the processes that were adopted and the limitations of automated transcription.
Original languageEnglish
Title of host publicationHandbook of linguistic annotation
EditorsNancy Ide, James Pustejovsky
Place of PublicationDordrecht
PublisherSpringer, Springer Nature
Chapter49
Pages1287-1301
Number of pages15
ISBN (Electronic)9789402408812
ISBN (Print)9789402408799
DOIs
Publication statusPublished - 2017

Keywords

  • speech corpus
  • Australian English
  • large corpora
  • spontaneous speech

Fingerprint

Dive into the research topics of 'Case study: the AusTalk corpus'. Together they form a unique fingerprint.

Cite this