Skip to main navigation Skip to search Skip to main content

Interpretable binaural ratio for visually guided binaural audio generation

Tao Zheng, Sunny Verma, Wei Liu

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

Abstract

Video and audio streams are essential and mutually complementary in multimedia immersive application scenarios. Recent studies have explored the field of deep neural net-work application on multimedia production, e.g., visually guided generation of binaural audio, where Difference Mask (DM) is the predominant strategy in the state-of-the-art (SOTA) work. However, this strategy is not interpretable and requires adding the ground truth output as the input, limiting applicability. Besides, the generated audio has a relatively low spatial sensation. This paper aims to develop an interpretable and robust approach to visually guided binaural audio generation. Specifically, we generalize a concept and new strategy from Difference Mask, named Binaural Ratio, to interpret its binaural property relevant to the Inter-aural Time Difference (ITD) and Inter-aural Level Difference (ILD). In the new strategy, the model input can be natural and arbitrary mono audio instead of the direct sum of left and right audio, i.e., ground truth output. Moreover, we identify that one reason for the low spatial sensation is the bias toward mono. Thus, we tackle it by designing new network variants to learn the Binaural Ratio robustly. Experiments show that our proposed approach significantly outperforms the SOTA methods in both objective and subjective evaluation metrics.

Original languageEnglish
Title of host publication2022 International Joint Conference on Neural Networks (IJCNN)
Subtitle of host publication2022 conference proceedings
Place of PublicationPiscataway, NJ
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Number of pages8
ISBN (Electronic)9781728186719
ISBN (Print)9781665495264
DOIs
Publication statusPublished - 2022
Event2022 International Joint Conference on Neural Networks, IJCNN 2022 - Padua, Italy
Duration: 18 Jul 202223 Jul 2022

Publication series

NameProceedings of the International Joint Conference on Neural Networks
ISSN (Print)2161-4393
ISSN (Electronic)2161-4407

Conference

Conference2022 International Joint Conference on Neural Networks, IJCNN 2022
Country/TerritoryItaly
CityPadua
Period18/07/2223/07/22

Fingerprint

Dive into the research topics of 'Interpretable binaural ratio for visually guided binaural audio generation'. Together they form a unique fingerprint.

Cite this