Abstract
Convolutional neural networks (CNN) are being increasingly used for audio signal classification applications, including acoustic event recognition. CNN is an image classifier and acoustic event signals are often represented using time-frequency image for this purpose. However, the length or duration of the sound event signals can vary greatly and an important consideration is how to equally size time-frequency images for classification using CNN. In this paper, we use techniques from digital image processing to address this problem. In particular, we apply interpolation-based image resizing techniques to form equally sized time-frequency representations. We consider nearest-neighbor, bilinear, bicubic, and Lanczos kernel interpolation methods for this purpose. A database containing 50 sound event classes with sound events of varying duration is used to evaluate the classification performance of these resized time-frequency images. The results show that the time-frequency images resized using bicubic and Lanczos kernel interpolation methods give a much improved classification performance than the conventional time-frequency image representation.
Original language | English |
---|---|
Title of host publication | 2019 IEEE International Conference on Signals and Systems |
Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
Pages | 8-11 |
Number of pages | 4 |
ISBN (Electronic) | 9781728121772 |
ISBN (Print) | 9781728121789 |
DOIs | |
Publication status | Published - 1 Jul 2019 |
Externally published | Yes |
Event | 2019 IEEE International Conference on Signals and Systems, ICSigSys 2019 - Bandung, Indonesia Duration: 16 Jul 2019 → 18 Jul 2019 |
Conference
Conference | 2019 IEEE International Conference on Signals and Systems, ICSigSys 2019 |
---|---|
Country/Territory | Indonesia |
City | Bandung |
Period | 16/07/19 → 18/07/19 |
Keywords
- acoustic event recognition
- convolutional neural network
- image resize
- interpolation
- spectrogram
- time-frequency image