Benchmarking audio signal representation techniques for classification with convolutional neural networks

Roneel V. Sharan*, Hao Xiong, Shlomo Berkovsky

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

14 Downloads (Pure)

Abstract

Audio signal classification finds various applications in detecting and monitoring health conditions in healthcare. Convolutional neural networks (CNN) have produced state‐of‐the‐art results in image classification and are being increasingly used in other tasks, including signal classification. However, audio signal classification using CNN presents various challenges. In image classification tasks, raw images of equal dimensions can be used as a direct input to CNN. Raw time‐domain signals, on the other hand, can be of varying dimensions. In addition, the temporal signal often has to be transformed to frequency‐domain to reveal unique spectral characteristics, therefore requiring signal transformation. In this work, we overview and benchmark various audio signal representation techniques for classification using CNN, including approaches that deal with signals of different lengths and combine multiple representations to improve the classification accuracy. Hence, this work surfaces important empirical evidence that may guide future works deploying CNN for audio signal classification purposes.

Original languageEnglish
Article number3434
Number of pages13
JournalSensors
Volume21
Issue number10
DOIs
Publication statusPublished - 14 May 2021

Bibliographical note

Copyright the Author(s) 2021. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.

Keywords

  • Convolutional neural networks
  • Fusion
  • Interpolation
  • Machine learning
  • Spectrogram
  • Time‐frequency image

Fingerprint

Dive into the research topics of 'Benchmarking audio signal representation techniques for classification with convolutional neural networks'. Together they form a unique fingerprint.

Cite this