Abstract
Voice command is an important interface between human and technology in healthcare, such as for hands-free control of surgical robots and in patient care technology. Voice command recognition can be cast as a speech classification task, where convolutional neural networks (CNNs) have demonstrated strong performance. CNN is originally an image classification technique and time-frequency representation of speech signals is the most commonly used image-like representation for CNNs. Various types of time-frequency representations are commonly used for this purpose. This work investigates the use of cochleagram, utilizing a gammatone filter which models the frequency selectivity of the human cochlea, as the time-frequency representation of voice commands and input for the CNN classifier. We also explore multi-view CNN as a technique for combining learning from different time-frequency representations. The proposed method is evaluated on a large dataset and shown to achieve high classification accuracy.
Original language | English |
---|---|
Title of host publication | 42nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society |
Subtitle of host publication | Enabling Innovative Technologies for Global Healthcare, EMBC 2020 |
Place of Publication | Piscataway, NJ |
Publisher | Institute of Electrical and Electronics Engineers (IEEE) |
Pages | 998-1001 |
Number of pages | 4 |
ISBN (Electronic) | 9781728119908 |
ISBN (Print) | 9781728119915 |
DOIs | |
Publication status | Published - 1 Jul 2020 |
Event | 42nd Annual International Conferences of the IEEE Engineering in Medicine and Biology Society, EMBC 2020 - Montreal, Canada Duration: 20 Jul 2020 → 24 Jul 2020 |
Conference
Conference | 42nd Annual International Conferences of the IEEE Engineering in Medicine and Biology Society, EMBC 2020 |
---|---|
Country/Territory | Canada |
City | Montreal |
Period | 20/07/20 → 24/07/20 |