Skip to main navigation Skip to search Skip to main content

Where to focus: investigating hierarchical attention relationship for fine-grained visual classification

Yang Liu, Lei Zhou, Pengcheng Zhang, Xiao Bai*, Lin Gu, Xiaohan Yu, Jun Zhou, Edwin R. Hancock

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

Abstract

Object categories are often grouped into a multi-granularity taxonomic hierarchy. Classifying objects at coarser-grained hierarchy requires global and common characteristics, while finer-grained hierarchy classification relies on local and discriminative features. Therefore, humans should also subconsciously focus on different object regions when classifying different hierarchies. This granularity-wise attention is confirmed by our collected human real-time gaze data on different hierarchy classifications. To leverage this mechanism, we propose a Cross-Hierarchical Region Feature (CHRF) learning framework. Specifically, we first design a region feature mining module that imitates humans to learn different granularity-wise attention regions with multi-grained classification tasks. To explore how human attention shifts from one hierarchy to another, we further present a cross-hierarchical orthogonal fusion module to enhance the region feature representation by blending the original feature and an orthogonal component extracted from adjacent hierarchies. Experiments on five hierarchical fine-grained datasets demonstrate the effectiveness of CHRF compared with the state-of-the-art methods. Ablation study and visualization results also consistently verify the advantages of our human attention-oriented modules. The code and dataset are available at https://github.com/visiondom/CHRF.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2022
Subtitle of host publication17th European Conference, Tel Aviv, Israel, October 23–27, 2022, proceedings, part XXIV
EditorsShai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, Tal Hassner
Place of PublicationCham
PublisherSpringer, Springer Nature
Pages57-73
Number of pages17
ISBN (Electronic)9783031200533
ISBN (Print)9783031200526
DOIs
Publication statusPublished - 2022
Externally publishedYes
Event17th European Conference on Computer Vision, ECCV 2022 - Tel Aviv, Israel
Duration: 23 Oct 202227 Oct 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13684 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference17th European Conference on Computer Vision, ECCV 2022
Country/TerritoryIsrael
CityTel Aviv
Period23/10/2227/10/22

Keywords

  • Fine-grained visual classification
  • Multi-granularity
  • Human attention
  • Orthogonal fusion

Fingerprint

Dive into the research topics of 'Where to focus: investigating hierarchical attention relationship for fine-grained visual classification'. Together they form a unique fingerprint.

Cite this