Auditory speech processing for scale-shift covariance and its evaluation in automatic speech recognition

Roy D. Patterson, Thomas C. Walters, Jessica Monaghan, Christian Feldbauer, Toshio Irino

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

1 Citation (Scopus)

Abstract

The syllables of speech contain information about the vocal tract length (VTL) of the speaker as well as the phonetic message. Ideally, the pre-processor used for automatic speech recognition (ASR) should segregate the phonetic message from the VTL information. This paper describes a method to calculate VTL-invariant auditory feature vectors from speech, using a method in which the message and the VTL are segregated. Spectra produced by an auditory filterbank are summarized by a Gaussian mixture model (GMM) to produce a low-dimensional feature vector. These features are evaluated for robustness in comparison with conventional mel-frequency cepstral coefficients (MFCCs) using a hidden-Markov-model (HMM) recognizer. A dynamic, compressive gammachirp (dcGC) auditory filterbank is also introduced. The dcGC provides a leveldependent spectral analysis, with near instantaneous compression, and two-tone suppression.

Original languageEnglish
Title of host publicationISCAS 2010 - 2010 IEEE International Symposium on Circuits and Systems: Nano-Bio Circuit Fabrics and Systems
Place of PublicationPiscataway, NJ
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages3813-3816
Number of pages4
ISBN (Print)9781424453085
DOIs
Publication statusPublished - 2010
Externally publishedYes
Event2010 IEEE International Symposium on Circuits and Systems: Nano-Bio Circuit Fabrics and Systems, ISCAS 2010 - Paris, France
Duration: 30 May 20102 Jun 2010

Other

Other2010 IEEE International Symposium on Circuits and Systems: Nano-Bio Circuit Fabrics and Systems, ISCAS 2010
Country/TerritoryFrance
CityParis
Period30/05/102/06/10

Fingerprint

Dive into the research topics of 'Auditory speech processing for scale-shift covariance and its evaluation in automatic speech recognition'. Together they form a unique fingerprint.

Cite this