Abstract
This chapter presents detail of the Annotation Task of the Big Australian Speech Corpus (Big ASC) project, in which AusTalk, a large audio-visual corpus of Australian English, was collected. We describe the scope of the task and its implementation and give an overview of the results so far. When complete, AusTalk will consist of 3 h of audio-visual recording from each of 1000 speakers of Australian English, across a wide range of tasks including scripted (read) speech, spontaneous speech and dialogue. The read speech of 100 participants has now been manually annotated but a challenge of the project was to produce transcriptions for the unscripted (spontaneous) speech data. We report on several avenues that have been explored for the automation of this task. We describe the annotation challenges, the processes that were adopted and the limitations of automated transcription.
Original language | English |
---|---|
Title of host publication | Handbook of linguistic annotation |
Editors | Nancy Ide, James Pustejovsky |
Place of Publication | Dordrecht |
Publisher | Springer, Springer Nature |
Chapter | 49 |
Pages | 1287-1301 |
Number of pages | 15 |
ISBN (Electronic) | 9789402408812 |
ISBN (Print) | 9789402408799 |
DOIs | |
Publication status | Published - 2017 |
Keywords
- speech corpus
- Australian English
- large corpora
- spontaneous speech