Abstract
Spoken digit recognition finds numerous applications in digital technologies. Various feature engineering and classification strategies have been proposed for this purpose. This work explores the use of convolutional neural network (CNN) for spoken digit recognition. CNN is originally an image classifier and time-frequency representation of the spoken digit is used in this work to get an image-like representation. In particular, wavelet transform is used in forming the time-frequency representation as it provides better frequency localization for low frequency signals such as speech. The time-frequency representation is resized to a common dimension using bicubic interpolation and the resulting image-like representation, referred as scalogram, is used for recognizing spoken digits using CNN. In addition, late fusion is employed to combine the learning from scalogram representation and conventional time-frequency representations. The proposed approach is evaluated on a dataset containing 56,290 segments belonging to ten spoken digits, non-digits comprised of various other spoken words, and background noise. An overall validation and test error of 2.85% and 2.84% is achieved using the proposed method, outperforming various conventional methods.
Original language | English |
---|---|
Title of host publication | 2020 IEEE Recent Advances in Intelligent Computational Systems (RAICS) |
Place of Publication | India |
Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
Pages | 101-105 |
Number of pages | 5 |
ISBN (Electronic) | 9781728190525 |
ISBN (Print) | 9781728190532 |
DOIs | |
Publication status | Published - 3 Dec 2020 |
Event | 2020 IEEE Recent Advances in Intelligent Computational Systems, RAICS 2020 - Thiruvananthapuram, India Duration: 3 Dec 2020 → 5 Dec 2020 |
Conference
Conference | 2020 IEEE Recent Advances in Intelligent Computational Systems, RAICS 2020 |
---|---|
Country/Territory | India |
City | Thiruvananthapuram |
Period | 3/12/20 → 5/12/20 |
Keywords
- bicubic interpolation
- convolutional neural networks
- late fusion
- scalogram
- spoken digit recognition
- wavelet transform