The Australian National Corpus: national infrastructure for language resources

Steve Cassidy, Michael Haugh, Pam Peters, Mark Fallu

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionResearchpeer-review

Abstract

The Australian National Corpus has been established in an effort to make currently scattered and relatively inaccessible data available to researchers through an online portal. In contrast to other national corpora, it is conceptualised as a linked collection of many existing and future language resources representing language use in Australia, unified through common technical standards. This approach allows us to bootstrap a significant collection and add value to existing resources by providing a unified, online tool-set to support research in a number of disciplines. This paper provides an outline of the technical platform being developed to support the corpus and a brief overview of some of the collections that form part of the initial version of the Australian National Corpus.
LanguageEnglish
Title of host publicationProceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)
EditorsNicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis
PublisherEuropean Language Resources Association (ELRA)
Pages3295-3299
Number of pages5
ISBN (Print)9782951740877
Publication statusPublished - 2012
EventInternational Conference on Language Resources and Evaluation (8th : 2012) - Istanbul, Turkey
Duration: 23 May 201225 May 2012

Conference

ConferenceInternational Conference on Language Resources and Evaluation (8th : 2012)
CityIstanbul, Turkey
Period23/05/1225/05/12

Fingerprint

infrastructure
technical standard
language
resources

Keywords

  • national corpus
  • annotation
  • meta-data

Cite this

Cassidy, S., Haugh, M., Peters, P., & Fallu, M. (2012). The Australian National Corpus: national infrastructure for language resources. In N. Calzolari, K. Choukri, T. Declerck, M. Uğur Doğan, B. Maegaard, J. Mariani, J. Odijk, ... S. Piperidis (Eds.), Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12) (pp. 3295-3299). European Language Resources Association (ELRA).
Cassidy, Steve ; Haugh, Michael ; Peters, Pam ; Fallu, Mark. / The Australian National Corpus : national infrastructure for language resources. Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12). editor / Nicoletta Calzolari ; Khalid Choukri ; Thierry Declerck ; Mehmet Uğur Doğan ; Bente Maegaard ; Joseph Mariani ; Jan Odijk ; Stelios Piperidis. European Language Resources Association (ELRA), 2012. pp. 3295-3299
@inproceedings{bf1e34ee0c29464fb18cbc9ade9fcd1a,
title = "The Australian National Corpus: national infrastructure for language resources",
abstract = "The Australian National Corpus has been established in an effort to make currently scattered and relatively inaccessible data available to researchers through an online portal. In contrast to other national corpora, it is conceptualised as a linked collection of many existing and future language resources representing language use in Australia, unified through common technical standards. This approach allows us to bootstrap a significant collection and add value to existing resources by providing a unified, online tool-set to support research in a number of disciplines. This paper provides an outline of the technical platform being developed to support the corpus and a brief overview of some of the collections that form part of the initial version of the Australian National Corpus.",
keywords = "national corpus, annotation, meta-data",
author = "Steve Cassidy and Michael Haugh and Pam Peters and Mark Fallu",
year = "2012",
language = "English",
isbn = "9782951740877",
pages = "3295--3299",
editor = "Nicoletta Calzolari and Khalid Choukri and Thierry Declerck and {Uğur Doğan}, Mehmet and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis",
booktitle = "Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)",
publisher = "European Language Resources Association (ELRA)",

}

Cassidy, S, Haugh, M, Peters, P & Fallu, M 2012, The Australian National Corpus: national infrastructure for language resources. in N Calzolari, K Choukri, T Declerck, M Uğur Doğan, B Maegaard, J Mariani, J Odijk & S Piperidis (eds), Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12). European Language Resources Association (ELRA), pp. 3295-3299, International Conference on Language Resources and Evaluation (8th : 2012), Istanbul, Turkey, 23/05/12.

The Australian National Corpus : national infrastructure for language resources. / Cassidy, Steve; Haugh, Michael; Peters, Pam; Fallu, Mark.

Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12). ed. / Nicoletta Calzolari; Khalid Choukri; Thierry Declerck; Mehmet Uğur Doğan; Bente Maegaard; Joseph Mariani; Jan Odijk; Stelios Piperidis. European Language Resources Association (ELRA), 2012. p. 3295-3299.

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionResearchpeer-review

TY - GEN

T1 - The Australian National Corpus

T2 - national infrastructure for language resources

AU - Cassidy, Steve

AU - Haugh, Michael

AU - Peters, Pam

AU - Fallu, Mark

PY - 2012

Y1 - 2012

N2 - The Australian National Corpus has been established in an effort to make currently scattered and relatively inaccessible data available to researchers through an online portal. In contrast to other national corpora, it is conceptualised as a linked collection of many existing and future language resources representing language use in Australia, unified through common technical standards. This approach allows us to bootstrap a significant collection and add value to existing resources by providing a unified, online tool-set to support research in a number of disciplines. This paper provides an outline of the technical platform being developed to support the corpus and a brief overview of some of the collections that form part of the initial version of the Australian National Corpus.

AB - The Australian National Corpus has been established in an effort to make currently scattered and relatively inaccessible data available to researchers through an online portal. In contrast to other national corpora, it is conceptualised as a linked collection of many existing and future language resources representing language use in Australia, unified through common technical standards. This approach allows us to bootstrap a significant collection and add value to existing resources by providing a unified, online tool-set to support research in a number of disciplines. This paper provides an outline of the technical platform being developed to support the corpus and a brief overview of some of the collections that form part of the initial version of the Australian National Corpus.

KW - national corpus

KW - annotation

KW - meta-data

M3 - Conference proceeding contribution

SN - 9782951740877

SP - 3295

EP - 3299

BT - Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)

A2 - Calzolari, Nicoletta

A2 - Choukri, Khalid

A2 - Declerck, Thierry

A2 - Uğur Doğan, Mehmet

A2 - Maegaard, Bente

A2 - Mariani, Joseph

A2 - Odijk, Jan

A2 - Piperidis, Stelios

PB - European Language Resources Association (ELRA)

ER -

Cassidy S, Haugh M, Peters P, Fallu M. The Australian National Corpus: national infrastructure for language resources. In Calzolari N, Choukri K, Declerck T, Uğur Doğan M, Maegaard B, Mariani J, Odijk J, Piperidis S, editors, Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12). European Language Resources Association (ELRA). 2012. p. 3295-3299