Case study: the AusTalk corpus

Stephen Cassidy, Dominique Estival, Felicity Cox

Research output: Chapter in Book/Report/Conference proceedingChapterResearchpeer-review

Abstract

This chapter presents detail of the Annotation Task of the Big Australian Speech Corpus (Big ASC) project, in which AusTalk, a large audio-visual corpus of Australian English, was collected. We describe the scope of the task and its implementation and give an overview of the results so far. When complete, AusTalk will consist of 3 h of audio-visual recording from each of 1000 speakers of Australian English, across a wide range of tasks including scripted (read) speech, spontaneous speech and dialogue. The read speech of 100 participants has now been manually annotated but a challenge of the project was to produce transcriptions for the unscripted (spontaneous) speech data. We report on several avenues that have been explored for the automation of this task. We describe the annotation challenges, the processes that were adopted and the limitations of automated transcription.
LanguageEnglish
Title of host publicationHandbook of linguistic annotation
EditorsNancy Ide, James Pustejovsky
Place of PublicationDordrecht
PublisherSpringer, Springer Nature
Chapter49
Pages1287-1301
Number of pages15
ISBN (Electronic)9789402408812
ISBN (Print)9789402408799
DOIs
Publication statusPublished - 2017

Fingerprint

Transcription
Automation

Keywords

  • speech corpus
  • Australian English
  • large corpora
  • spontaneous speech

Cite this

Cassidy, S., Estival, D., & Cox, F. (2017). Case study: the AusTalk corpus. In N. Ide, & J. Pustejovsky (Eds.), Handbook of linguistic annotation (pp. 1287-1301). Dordrecht: Springer, Springer Nature. https://doi.org/10.1007/978-94-024-0881-2_49
Cassidy, Stephen ; Estival, Dominique ; Cox, Felicity. / Case study : the AusTalk corpus. Handbook of linguistic annotation. editor / Nancy Ide ; James Pustejovsky. Dordrecht : Springer, Springer Nature, 2017. pp. 1287-1301
@inbook{c49eba8dee574bedb784af4d37396f9a,
title = "Case study: the AusTalk corpus",
abstract = "This chapter presents detail of the Annotation Task of the Big Australian Speech Corpus (Big ASC) project, in which AusTalk, a large audio-visual corpus of Australian English, was collected. We describe the scope of the task and its implementation and give an overview of the results so far. When complete, AusTalk will consist of 3 h of audio-visual recording from each of 1000 speakers of Australian English, across a wide range of tasks including scripted (read) speech, spontaneous speech and dialogue. The read speech of 100 participants has now been manually annotated but a challenge of the project was to produce transcriptions for the unscripted (spontaneous) speech data. We report on several avenues that have been explored for the automation of this task. We describe the annotation challenges, the processes that were adopted and the limitations of automated transcription.",
keywords = "speech corpus, Australian English, large corpora, spontaneous speech",
author = "Stephen Cassidy and Dominique Estival and Felicity Cox",
year = "2017",
doi = "10.1007/978-94-024-0881-2_49",
language = "English",
isbn = "9789402408799",
pages = "1287--1301",
editor = "Nancy Ide and James Pustejovsky",
booktitle = "Handbook of linguistic annotation",
publisher = "Springer, Springer Nature",
address = "United States",

}

Cassidy, S, Estival, D & Cox, F 2017, Case study: the AusTalk corpus. in N Ide & J Pustejovsky (eds), Handbook of linguistic annotation. Springer, Springer Nature, Dordrecht, pp. 1287-1301. https://doi.org/10.1007/978-94-024-0881-2_49

Case study : the AusTalk corpus. / Cassidy, Stephen; Estival, Dominique; Cox, Felicity.

Handbook of linguistic annotation. ed. / Nancy Ide; James Pustejovsky. Dordrecht : Springer, Springer Nature, 2017. p. 1287-1301.

Research output: Chapter in Book/Report/Conference proceedingChapterResearchpeer-review

TY - CHAP

T1 - Case study

T2 - the AusTalk corpus

AU - Cassidy,Stephen

AU - Estival,Dominique

AU - Cox,Felicity

PY - 2017

Y1 - 2017

N2 - This chapter presents detail of the Annotation Task of the Big Australian Speech Corpus (Big ASC) project, in which AusTalk, a large audio-visual corpus of Australian English, was collected. We describe the scope of the task and its implementation and give an overview of the results so far. When complete, AusTalk will consist of 3 h of audio-visual recording from each of 1000 speakers of Australian English, across a wide range of tasks including scripted (read) speech, spontaneous speech and dialogue. The read speech of 100 participants has now been manually annotated but a challenge of the project was to produce transcriptions for the unscripted (spontaneous) speech data. We report on several avenues that have been explored for the automation of this task. We describe the annotation challenges, the processes that were adopted and the limitations of automated transcription.

AB - This chapter presents detail of the Annotation Task of the Big Australian Speech Corpus (Big ASC) project, in which AusTalk, a large audio-visual corpus of Australian English, was collected. We describe the scope of the task and its implementation and give an overview of the results so far. When complete, AusTalk will consist of 3 h of audio-visual recording from each of 1000 speakers of Australian English, across a wide range of tasks including scripted (read) speech, spontaneous speech and dialogue. The read speech of 100 participants has now been manually annotated but a challenge of the project was to produce transcriptions for the unscripted (spontaneous) speech data. We report on several avenues that have been explored for the automation of this task. We describe the annotation challenges, the processes that were adopted and the limitations of automated transcription.

KW - speech corpus

KW - Australian English

KW - large corpora

KW - spontaneous speech

U2 - 10.1007/978-94-024-0881-2_49

DO - 10.1007/978-94-024-0881-2_49

M3 - Chapter

SN - 9789402408799

SP - 1287

EP - 1301

BT - Handbook of linguistic annotation

PB - Springer, Springer Nature

CY - Dordrecht

ER -

Cassidy S, Estival D, Cox F. Case study: the AusTalk corpus. In Ide N, Pustejovsky J, editors, Handbook of linguistic annotation. Dordrecht: Springer, Springer Nature. 2017. p. 1287-1301 https://doi.org/10.1007/978-94-024-0881-2_49