Spoken digit recognition using wavelet scalogram and convolutional neural networks

Roneel V. Sharan*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

8 Citations (Scopus)


Spoken digit recognition finds numerous applications in digital technologies. Various feature engineering and classification strategies have been proposed for this purpose. This work explores the use of convolutional neural network (CNN) for spoken digit recognition. CNN is originally an image classifier and time-frequency representation of the spoken digit is used in this work to get an image-like representation. In particular, wavelet transform is used in forming the time-frequency representation as it provides better frequency localization for low frequency signals such as speech. The time-frequency representation is resized to a common dimension using bicubic interpolation and the resulting image-like representation, referred as scalogram, is used for recognizing spoken digits using CNN. In addition, late fusion is employed to combine the learning from scalogram representation and conventional time-frequency representations. The proposed approach is evaluated on a dataset containing 56,290 segments belonging to ten spoken digits, non-digits comprised of various other spoken words, and background noise. An overall validation and test error of 2.85% and 2.84% is achieved using the proposed method, outperforming various conventional methods.

Original languageEnglish
Title of host publication2020 IEEE Recent Advances in Intelligent Computational Systems (RAICS)
Place of PublicationIndia
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Number of pages5
ISBN (Electronic)9781728190525
ISBN (Print)9781728190532
Publication statusPublished - 3 Dec 2020
Event2020 IEEE Recent Advances in Intelligent Computational Systems, RAICS 2020 - Thiruvananthapuram, India
Duration: 3 Dec 20205 Dec 2020


Conference2020 IEEE Recent Advances in Intelligent Computational Systems, RAICS 2020


  • bicubic interpolation
  • convolutional neural networks
  • late fusion
  • scalogram
  • spoken digit recognition
  • wavelet transform


Dive into the research topics of 'Spoken digit recognition using wavelet scalogram and convolutional neural networks'. Together they form a unique fingerprint.

Cite this