Abstract
Exploring complementary information between RGB and thermal/depth modalities is crucial for bi-modal salient object detection (BSOD). However, the distinct characteristics of different modalities often lead to large differences in information distributions. Existing models, which rely on convolutional operations or plug-and-play attention mechanisms, struggle to address this issue. To overcome this challenge, we rethink the relationship between information complementarity and long-range relevance, and propose a uniform broad-view Twins Transformer Network (TwinsTNet) for accurate BSOD. Specifically, to efficiently fuse bi-modal information, we first design the Cross-Modal Federated Attention (CMFA), which mines complementary cues across modalities through element-wise global dependency. Second, to ensure accurate modality fusion, we propose the Semantic Consistency Attention Loss, which supervises the co-attention feature in CMFA using the ground-truth-generated attention map. Additionally, existing BSOD models lack the exploration of inter-layer interactions, for which we propose the Cross-Scale Retracing Attention (CSRA), which retrieves query-relevant information from stacked features of all previous layers, enabling flexible cross-layer interactions. The cooperation between CMFA and CSRA mitigates inductive bias in both modality and layer dimensions, enhancing TwinsTNet's representational capability. Extensive experiments demonstrate that TwinsTNet outperforms twenty-two existing state-of-the-art models on ten BSOD benchmark datasets. The code is available at: https://github.com/JoshuaLPF/TwinsTNet.
| Original language | English |
|---|---|
| Pages (from-to) | 2796-2810 |
| Number of pages | 15 |
| Journal | IEEE Transactions on Image Processing |
| Volume | 34 |
| DOIs | |
| Publication status | Published - 2025 |
Fingerprint
Dive into the research topics of 'TwinsTNet: broad-view twins transformer network for bi-modal salient object detection'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver