Spoken language data resources for Australian speech technology

J. B. Millar, J. Harrington, J. Vonwiller

    Research output: Contribution to journalArticlepeer-review

    2 Citations (Scopus)


    Current speech technology is dominated by techniques which utlilise data-driven statistical models of spoken language. The critical role played by corpora of speech data in the development of these models is introduced. The factors which influence the design of data resources to feed these models are then reviewed. These lead to a description of the first stage of the Australian National Database of Spoken Language in terms of the selection of speakers and linguistics materials as well as the methods used to elicit the speech and to record it. The critical areas of the annotation of the data, using both human and machine methods, are treated in some depth as is the structure of the machine-readable non-linguistic descriptive date that is provided for every speech signal file in the data corpus. Finally the way in which the data are packaged for dissemination is described.

    Original languageEnglish
    Pages (from-to)13-22
    Number of pages10
    JournalJournal of Electrical and Electronics Engineering, Australia
    Issue number1
    Publication statusPublished - Mar 1997


    Dive into the research topics of 'Spoken language data resources for Australian speech technology'. Together they form a unique fingerprint.

    Cite this