Acoustic event recognition using cochleagram image and convolutional neural networks

Roneel V. Sharan, Tom J. Moir

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Convolutional neural networks (CNN) have produced encouraging results in image classification tasks and have been increasingly adopted in audio classification applications. However, in using CNN for acoustic event recognition, the first hurdle is finding the best image representation of an audio signal. In this work, we evaluate the performance of four time-frequency representations for use with CNN. Firstly, we consider the conventional spectrogram image. Secondly, we apply moving average to the spectrogram along the frequency domain to obtain what we refer as the smoothed spectrogram. Thirdly, we use the mel-spectrogram which utilizes the mel-filter, as in mel-frequency cepstral coefficients. Finally, we propose the use of a cochleagram image the frequency components of which are based on the frequency selectivity property of the human cochlea. We test the proposed techniques on an acoustic event database containing 50 sound classes. The results show that the proposed cochleagram time-frequency image representation gives the best classification performance when used with CNN.

LanguageEnglish
Pages62-66
Number of pages5
JournalApplied Acoustics
Volume148
DOIs
Publication statusPublished - 1 May 2019
Externally publishedYes

Fingerprint

spectrograms
acoustics
image classification
cochlea
audio signals
selectivity
filters
coefficients

Keywords

  • Acoustic event recognition
  • Cochleagram
  • Convolutional neural network
  • Mel-spectrogram
  • Spectrogram

Cite this

@article{dc8babd8a6784065854b2147f4bdbba5,
title = "Acoustic event recognition using cochleagram image and convolutional neural networks",
abstract = "Convolutional neural networks (CNN) have produced encouraging results in image classification tasks and have been increasingly adopted in audio classification applications. However, in using CNN for acoustic event recognition, the first hurdle is finding the best image representation of an audio signal. In this work, we evaluate the performance of four time-frequency representations for use with CNN. Firstly, we consider the conventional spectrogram image. Secondly, we apply moving average to the spectrogram along the frequency domain to obtain what we refer as the smoothed spectrogram. Thirdly, we use the mel-spectrogram which utilizes the mel-filter, as in mel-frequency cepstral coefficients. Finally, we propose the use of a cochleagram image the frequency components of which are based on the frequency selectivity property of the human cochlea. We test the proposed techniques on an acoustic event database containing 50 sound classes. The results show that the proposed cochleagram time-frequency image representation gives the best classification performance when used with CNN.",
keywords = "Acoustic event recognition, Cochleagram, Convolutional neural network, Mel-spectrogram, Spectrogram",
author = "Sharan, {Roneel V.} and Moir, {Tom J.}",
year = "2019",
month = "5",
day = "1",
doi = "10.1016/j.apacoust.2018.12.006",
language = "English",
volume = "148",
pages = "62--66",
journal = "Applied Acoustics",
issn = "0003-682X",
publisher = "Elsevier",

}

Acoustic event recognition using cochleagram image and convolutional neural networks. / Sharan, Roneel V.; Moir, Tom J.

In: Applied Acoustics, Vol. 148, 01.05.2019, p. 62-66.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Acoustic event recognition using cochleagram image and convolutional neural networks

AU - Sharan, Roneel V.

AU - Moir, Tom J.

PY - 2019/5/1

Y1 - 2019/5/1

N2 - Convolutional neural networks (CNN) have produced encouraging results in image classification tasks and have been increasingly adopted in audio classification applications. However, in using CNN for acoustic event recognition, the first hurdle is finding the best image representation of an audio signal. In this work, we evaluate the performance of four time-frequency representations for use with CNN. Firstly, we consider the conventional spectrogram image. Secondly, we apply moving average to the spectrogram along the frequency domain to obtain what we refer as the smoothed spectrogram. Thirdly, we use the mel-spectrogram which utilizes the mel-filter, as in mel-frequency cepstral coefficients. Finally, we propose the use of a cochleagram image the frequency components of which are based on the frequency selectivity property of the human cochlea. We test the proposed techniques on an acoustic event database containing 50 sound classes. The results show that the proposed cochleagram time-frequency image representation gives the best classification performance when used with CNN.

AB - Convolutional neural networks (CNN) have produced encouraging results in image classification tasks and have been increasingly adopted in audio classification applications. However, in using CNN for acoustic event recognition, the first hurdle is finding the best image representation of an audio signal. In this work, we evaluate the performance of four time-frequency representations for use with CNN. Firstly, we consider the conventional spectrogram image. Secondly, we apply moving average to the spectrogram along the frequency domain to obtain what we refer as the smoothed spectrogram. Thirdly, we use the mel-spectrogram which utilizes the mel-filter, as in mel-frequency cepstral coefficients. Finally, we propose the use of a cochleagram image the frequency components of which are based on the frequency selectivity property of the human cochlea. We test the proposed techniques on an acoustic event database containing 50 sound classes. The results show that the proposed cochleagram time-frequency image representation gives the best classification performance when used with CNN.

KW - Acoustic event recognition

KW - Cochleagram

KW - Convolutional neural network

KW - Mel-spectrogram

KW - Spectrogram

UR - http://www.scopus.com/inward/record.url?scp=85058474671&partnerID=8YFLogxK

U2 - 10.1016/j.apacoust.2018.12.006

DO - 10.1016/j.apacoust.2018.12.006

M3 - Article

VL - 148

SP - 62

EP - 66

JO - Applied Acoustics

T2 - Applied Acoustics

JF - Applied Acoustics

SN - 0003-682X

ER -