Abstract
In this paper, we use the novel method of using features extracted from the time-frequency image representation of a sound signal in an audio surveillance application. In particular, we investigate two image representations: linear grayscale and log grayscale. We first divide a sound signal into smaller frames and apply a windowing function. The absolute value of the Discrete Fourier Transform of each frame is then computed and normalized to get the intensity values for the linear grayscale image. The generation of the log grayscale image takes a similar approach but we take log power of the values before data normalization. Each image is then divided into blocks and central moments are computed in each block. We carry out experimentation under different noise conditions and varying signal-to-noise ratio using support vector machines for classification. Based on the classification accuracy, the linear grayscale image approach is found to be more noise robust than the log grayscale image approach. It was also found to perform better than using mel-frequency cepstral coefficients as features which is a common baseline feature in most sound recognition applications.
Original language | English |
---|---|
Title of host publication | 2014 19th International Conference on Digital Signal Processing, DSP 2014 |
Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
Pages | 130-135 |
Number of pages | 6 |
ISBN (Electronic) | 9781479946129 |
DOIs | |
Publication status | Published - 1 Jan 2014 |
Externally published | Yes |
Event | 2014 19th International Conference on Digital Signal Processing, DSP 2014 - Hong Kong, Hong Kong Duration: 20 Aug 2014 → 23 Aug 2014 |
Conference
Conference | 2014 19th International Conference on Digital Signal Processing, DSP 2014 |
---|---|
Country/Territory | Hong Kong |
City | Hong Kong |
Period | 20/08/14 → 23/08/14 |
Keywords
- Audio surveillance
- Central moments
- Linear grayscale
- Log grayscale
- Signal-to-noise ratio
- Sound recognition
- Spectrogram
- Time-frequency image