Acoustic event recognition using cochleagram image and convolutional neural networks

Roneel V. Sharan*, Tom J. Moir

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

22 Citations (Scopus)


Convolutional neural networks (CNN) have produced encouraging results in image classification tasks and have been increasingly adopted in audio classification applications. However, in using CNN for acoustic event recognition, the first hurdle is finding the best image representation of an audio signal. In this work, we evaluate the performance of four time-frequency representations for use with CNN. Firstly, we consider the conventional spectrogram image. Secondly, we apply moving average to the spectrogram along the frequency domain to obtain what we refer as the smoothed spectrogram. Thirdly, we use the mel-spectrogram which utilizes the mel-filter, as in mel-frequency cepstral coefficients. Finally, we propose the use of a cochleagram image the frequency components of which are based on the frequency selectivity property of the human cochlea. We test the proposed techniques on an acoustic event database containing 50 sound classes. The results show that the proposed cochleagram time-frequency image representation gives the best classification performance when used with CNN.

Original languageEnglish
Pages (from-to)62-66
Number of pages5
JournalApplied Acoustics
Publication statusPublished - 1 May 2019
Externally publishedYes


  • Acoustic event recognition
  • Cochleagram
  • Convolutional neural network
  • Mel-spectrogram
  • Spectrogram


Dive into the research topics of 'Acoustic event recognition using cochleagram image and convolutional neural networks'. Together they form a unique fingerprint.

Cite this