AusTalk: an audio-visual corpus of Australian English

Dominique Estival, Steve Cassidy, Felicity Cox, Denis Burnham

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionResearchpeer-review

Abstract

This paper describes the AusTalk corpus, which was designed and created through the Big ASC, a collaborative project with the two main goals of providing a standardised infrastructure for audio-visual recordings in Australia and of producing a large audio-visual corpus of Australian English, with 3 hours of AV recordings for 1000 speakers. We first present the overall project, then describe the corpus itself and its components, the strict data collection protocol with high levels of standardisation and automation, and the processes put in place for quality control. We also discuss the annotation phase of the project, along with its goals and challenges; a major contribution of the project has been to explore procedures for automating annotations and we present our solutions. We conclude with the current status of the corpus and with some examples of research already conducted with this new resource. AusTalk is one of the corpora included in the Alveo Virtual Lab, which is briefly sketched in the conclusion.
LanguageEnglish
Title of host publicationProceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014)
EditorsNicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Place of PublicationReykjavik, Iceland
PublisherEuropean Language Resources Association
Pages3105-3109
Number of pages5
ISBN (Print)9782951740884
Publication statusPublished - 2014
EventInternational Conference on Language Resources and Evaluation (9th : 2014) - Reykjavik, Iceland
Duration: 26 May 201431 May 2014

Conference

ConferenceInternational Conference on Language Resources and Evaluation (9th : 2014)
CityReykjavik, Iceland
Period26/05/1431/05/14

Fingerprint

recording
quality control
automation
infrastructure
resources

Bibliographical note

Copyright the Author(s) 2014. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.

Cite this

Estival, D., Cassidy, S., Cox, F., & Burnham, D. (2014). AusTalk: an audio-visual corpus of Australian English. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, ... S. Piperidis (Eds.), Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014) (pp. 3105-3109). Reykjavik, Iceland: European Language Resources Association.
Estival, Dominique ; Cassidy, Steve ; Cox, Felicity ; Burnham, Denis. / AusTalk : an audio-visual corpus of Australian English. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014). editor / Nicoletta Calzolari ; Khalid Choukri ; Thierry Declerck ; Hrafn Loftsson ; Bente Maegaard ; Joseph Mariani ; Asuncion Moreno ; Jan Odijk ; Stelios Piperidis. Reykjavik, Iceland : European Language Resources Association, 2014. pp. 3105-3109
@inproceedings{b67c0ec8d68a42dab1f1184b75cc92ac,
title = "AusTalk: an audio-visual corpus of Australian English",
abstract = "This paper describes the AusTalk corpus, which was designed and created through the Big ASC, a collaborative project with the two main goals of providing a standardised infrastructure for audio-visual recordings in Australia and of producing a large audio-visual corpus of Australian English, with 3 hours of AV recordings for 1000 speakers. We first present the overall project, then describe the corpus itself and its components, the strict data collection protocol with high levels of standardisation and automation, and the processes put in place for quality control. We also discuss the annotation phase of the project, along with its goals and challenges; a major contribution of the project has been to explore procedures for automating annotations and we present our solutions. We conclude with the current status of the corpus and with some examples of research already conducted with this new resource. AusTalk is one of the corpora included in the Alveo Virtual Lab, which is briefly sketched in the conclusion.",
author = "Dominique Estival and Steve Cassidy and Felicity Cox and Denis Burnham",
note = "Copyright the Author(s) 2014. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.",
year = "2014",
language = "English",
isbn = "9782951740884",
pages = "3105--3109",
editor = "Nicoletta Calzolari and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis",
booktitle = "Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014)",
publisher = "European Language Resources Association",

}

Estival, D, Cassidy, S, Cox, F & Burnham, D 2014, AusTalk: an audio-visual corpus of Australian English. in N Calzolari, K Choukri, T Declerck, H Loftsson, B Maegaard, J Mariani, A Moreno, J Odijk & S Piperidis (eds), Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014). European Language Resources Association, Reykjavik, Iceland, pp. 3105-3109, International Conference on Language Resources and Evaluation (9th : 2014), Reykjavik, Iceland, 26/05/14.

AusTalk : an audio-visual corpus of Australian English. / Estival, Dominique; Cassidy, Steve; Cox, Felicity; Burnham, Denis.

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014). ed. / Nicoletta Calzolari; Khalid Choukri; Thierry Declerck; Hrafn Loftsson; Bente Maegaard; Joseph Mariani; Asuncion Moreno; Jan Odijk; Stelios Piperidis. Reykjavik, Iceland : European Language Resources Association, 2014. p. 3105-3109.

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionResearchpeer-review

TY - GEN

T1 - AusTalk

T2 - an audio-visual corpus of Australian English

AU - Estival, Dominique

AU - Cassidy, Steve

AU - Cox, Felicity

AU - Burnham, Denis

N1 - Copyright the Author(s) 2014. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.

PY - 2014

Y1 - 2014

N2 - This paper describes the AusTalk corpus, which was designed and created through the Big ASC, a collaborative project with the two main goals of providing a standardised infrastructure for audio-visual recordings in Australia and of producing a large audio-visual corpus of Australian English, with 3 hours of AV recordings for 1000 speakers. We first present the overall project, then describe the corpus itself and its components, the strict data collection protocol with high levels of standardisation and automation, and the processes put in place for quality control. We also discuss the annotation phase of the project, along with its goals and challenges; a major contribution of the project has been to explore procedures for automating annotations and we present our solutions. We conclude with the current status of the corpus and with some examples of research already conducted with this new resource. AusTalk is one of the corpora included in the Alveo Virtual Lab, which is briefly sketched in the conclusion.

AB - This paper describes the AusTalk corpus, which was designed and created through the Big ASC, a collaborative project with the two main goals of providing a standardised infrastructure for audio-visual recordings in Australia and of producing a large audio-visual corpus of Australian English, with 3 hours of AV recordings for 1000 speakers. We first present the overall project, then describe the corpus itself and its components, the strict data collection protocol with high levels of standardisation and automation, and the processes put in place for quality control. We also discuss the annotation phase of the project, along with its goals and challenges; a major contribution of the project has been to explore procedures for automating annotations and we present our solutions. We conclude with the current status of the corpus and with some examples of research already conducted with this new resource. AusTalk is one of the corpora included in the Alveo Virtual Lab, which is briefly sketched in the conclusion.

UR - http://purl.org/au-research/grants/arc/LE100100211

M3 - Conference proceeding contribution

SN - 9782951740884

SP - 3105

EP - 3109

BT - Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014)

A2 - Calzolari, Nicoletta

A2 - Choukri, Khalid

A2 - Declerck, Thierry

A2 - Loftsson, Hrafn

A2 - Maegaard, Bente

A2 - Mariani, Joseph

A2 - Moreno, Asuncion

A2 - Odijk, Jan

A2 - Piperidis, Stelios

PB - European Language Resources Association

CY - Reykjavik, Iceland

ER -

Estival D, Cassidy S, Cox F, Burnham D. AusTalk: an audio-visual corpus of Australian English. In Calzolari N, Choukri K, Declerck T, Loftsson H, Maegaard B, Mariani J, Moreno A, Odijk J, Piperidis S, editors, Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014). Reykjavik, Iceland: European Language Resources Association. 2014. p. 3105-3109