Voice command recognition using biologically inspired time-frequency representation and convolutional neural networks

Roneel V. Sharan, Shlomo Berkovsky, Sidong Liu

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

9 Citations (Scopus)

Abstract

Voice command is an important interface between human and technology in healthcare, such as for hands-free control of surgical robots and in patient care technology. Voice command recognition can be cast as a speech classification task, where convolutional neural networks (CNNs) have demonstrated strong performance. CNN is originally an image classification technique and time-frequency representation of speech signals is the most commonly used image-like representation for CNNs. Various types of time-frequency representations are commonly used for this purpose. This work investigates the use of cochleagram, utilizing a gammatone filter which models the frequency selectivity of the human cochlea, as the time-frequency representation of voice commands and input for the CNN classifier. We also explore multi-view CNN as a technique for combining learning from different time-frequency representations. The proposed method is evaluated on a large dataset and shown to achieve high classification accuracy.

Original languageEnglish
Title of host publication42nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society
Subtitle of host publicationEnabling Innovative Technologies for Global Healthcare, EMBC 2020
Place of PublicationPiscataway, NJ
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages998-1001
Number of pages4
ISBN (Electronic)9781728119908
ISBN (Print)9781728119915
DOIs
Publication statusPublished - 1 Jul 2020
Event42nd Annual International Conferences of the IEEE Engineering in Medicine and Biology Society, EMBC 2020 - Montreal, Canada
Duration: 20 Jul 202024 Jul 2020

Conference

Conference42nd Annual International Conferences of the IEEE Engineering in Medicine and Biology Society, EMBC 2020
Country/TerritoryCanada
CityMontreal
Period20/07/2024/07/20

Fingerprint

Dive into the research topics of 'Voice command recognition using biologically inspired time-frequency representation and convolutional neural networks'. Together they form a unique fingerprint.

Cite this