Skip to main navigation Skip to search Skip to main content

TwinsTNet: broad-view twins transformer network for bi-modal salient object detection

Pengfei Lyu, Xiaosheng Yu*, Jianning Chi, Hao Wu, Chengdong Wu, Jagath C. Rajapakse*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Exploring complementary information between RGB and thermal/depth modalities is crucial for bi-modal salient object detection (BSOD). However, the distinct characteristics of different modalities often lead to large differences in information distributions. Existing models, which rely on convolutional operations or plug-and-play attention mechanisms, struggle to address this issue. To overcome this challenge, we rethink the relationship between information complementarity and long-range relevance, and propose a uniform broad-view Twins Transformer Network (TwinsTNet) for accurate BSOD. Specifically, to efficiently fuse bi-modal information, we first design the Cross-Modal Federated Attention (CMFA), which mines complementary cues across modalities through element-wise global dependency. Second, to ensure accurate modality fusion, we propose the Semantic Consistency Attention Loss, which supervises the co-attention feature in CMFA using the ground-truth-generated attention map. Additionally, existing BSOD models lack the exploration of inter-layer interactions, for which we propose the Cross-Scale Retracing Attention (CSRA), which retrieves query-relevant information from stacked features of all previous layers, enabling flexible cross-layer interactions. The cooperation between CMFA and CSRA mitigates inductive bias in both modality and layer dimensions, enhancing TwinsTNet's representational capability. Extensive experiments demonstrate that TwinsTNet outperforms twenty-two existing state-of-the-art models on ten BSOD benchmark datasets. The code is available at: https://github.com/JoshuaLPF/TwinsTNet.

Original languageEnglish
Pages (from-to)2796-2810
Number of pages15
JournalIEEE Transactions on Image Processing
Volume34
DOIs
Publication statusPublished - 2025

Fingerprint

Dive into the research topics of 'TwinsTNet: broad-view twins transformer network for bi-modal salient object detection'. Together they form a unique fingerprint.

Cite this