Abstract
The syllables of speech contain information about the vocal tract length (VTL) of the speaker as well as the phonetic message. Ideally, the pre-processor used for automatic speech recognition (ASR) should segregate the phonetic message from the VTL information. This paper describes a method to calculate VTL-invariant auditory feature vectors from speech, using a method in which the message and the VTL are segregated. Spectra produced by an auditory filterbank are summarized by a Gaussian mixture model (GMM) to produce a low-dimensional feature vector. These features are evaluated for robustness in comparison with conventional mel-frequency cepstral coefficients (MFCCs) using a hidden-Markov-model (HMM) recognizer. A dynamic, compressive gammachirp (dcGC) auditory filterbank is also introduced. The dcGC provides a leveldependent spectral analysis, with near instantaneous compression, and two-tone suppression.
| Original language | English |
|---|---|
| Title of host publication | ISCAS 2010 - 2010 IEEE International Symposium on Circuits and Systems: Nano-Bio Circuit Fabrics and Systems |
| Place of Publication | Piscataway, NJ |
| Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
| Pages | 3813-3816 |
| Number of pages | 4 |
| ISBN (Print) | 9781424453085 |
| DOIs | |
| Publication status | Published - 2010 |
| Externally published | Yes |
| Event | 2010 IEEE International Symposium on Circuits and Systems: Nano-Bio Circuit Fabrics and Systems, ISCAS 2010 - Paris, France Duration: 30 May 2010 → 2 Jun 2010 |
Other
| Other | 2010 IEEE International Symposium on Circuits and Systems: Nano-Bio Circuit Fabrics and Systems, ISCAS 2010 |
|---|---|
| Country/Territory | France |
| City | Paris |
| Period | 30/05/10 → 2/06/10 |
Fingerprint
Dive into the research topics of 'Auditory speech processing for scale-shift covariance and its evaluation in automatic speech recognition'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver