Bridging the gap between speech technology and natural language processing: an evaluation toolbox for term discovery systems

Bogdan Ludusan, Maarten Versteegh, Aren Jansen, Guillaume Gravier, Xuan-Nga Cao, Mark Johnson, Emmanuel Dupoux

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

30 Citations (Scopus)
76 Downloads (Pure)

Abstract

The unsupervised discovery of linguistic terms from either continuous phoneme transcriptions or from raw speech has seen an increasing interest in the past years both from a theoretical and a practical standpoint. Yet, there exists no common accepted evaluation method for the systems performing term discovery. Here, we propose such an evaluation toolbox, drawing ideas from both speech technology and natural language processing. We first transform the speech-based output into a symbolic representation and compute five types of evaluation metrics on this representation: the quality of acoustic matching, the quality of the clusters found, and the quality of the alignment with real words (type, token, and boundary scores). We tested our approach on two term discovery systems taking speech as input, and one using symbolic input. The latter was run using both the gold transcription and a transcription obtained from an automatic speech recognizer, in order to simulate the case when only imperfect symbolic information is available. The results obtained are analysed through the use of the proposed evaluation metrics and the implications of these metrics are discussed.
Original languageEnglish
Title of host publicationProceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014)
EditorsNicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Place of PublicationReykjavik, Iceland
PublisherAssociation for Computational Linguistics
Pages560-567
Number of pages8
ISBN (Print)9782951740884
Publication statusPublished - 2014
EventInternational Conference on Language Resources and Evaluation (9th : 2014) - Reykjavik, Iceland
Duration: 26 May 201431 May 2014

Conference

ConferenceInternational Conference on Language Resources and Evaluation (9th : 2014)
CityReykjavik, Iceland
Period26/05/1431/05/14

Bibliographical note

Copyright the Author(s) 2014. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.

Keywords

  • evaluation
  • spoken term discovery
  • word segmentation

Fingerprint

Dive into the research topics of 'Bridging the gap between speech technology and natural language processing: an evaluation toolbox for term discovery systems'. Together they form a unique fingerprint.

Cite this