TY - JOUR
T1 - MaskCOV
T2 - a random mask covariance network for ultra-fine-grained visual categorization
AU - Yu, Xiaohan
AU - Zhao, Yang
AU - Gao, Yongsheng
AU - Xiong, Shengwu
PY - 2021/11
Y1 - 2021/11
N2 - Ultra-fine-grained visual categorization (ultra-FGVC) categorizes objects with more similar patterns between classes than those in fine-grained visual categorization (FGVC), e.g., where the spectrum of granularity significantly moves down from classifying species to classifying cultivars within the same species. It is considered as an open research problem mainly due to the following challenges. First, the inter-class differences among images are much smaller by level of orders (e.g., cultivars in the same species) than those in current FGVC tasks (e.g., species). Second, there is only a few samples per category, which is beyond the ability of most large training data favored convolutional neural network methods. To address these problems, we propose a novel random mask covariance network (MaskCOV), which integrates an auxiliary self-supervised learning module with a powerful in-image data augmentation scheme for the ultra-FGVC. Specifically, we first uniformly partition input images into patches and then augment data by randomly shuffling and masking these patches. On top of that, we introduce an auxiliary self-supervised learning module of predicting the spatial covariance context of these patches to increase discriminability of our network for classification. Very encouraging experimental results of the proposed method in comparison with the state-of-the-art benchmarks demonstrate its superiority and potential of MaskCOV concept, which pushes research boundary forward from the fine-grained to the ultra-fine-grained visual categorization.
AB - Ultra-fine-grained visual categorization (ultra-FGVC) categorizes objects with more similar patterns between classes than those in fine-grained visual categorization (FGVC), e.g., where the spectrum of granularity significantly moves down from classifying species to classifying cultivars within the same species. It is considered as an open research problem mainly due to the following challenges. First, the inter-class differences among images are much smaller by level of orders (e.g., cultivars in the same species) than those in current FGVC tasks (e.g., species). Second, there is only a few samples per category, which is beyond the ability of most large training data favored convolutional neural network methods. To address these problems, we propose a novel random mask covariance network (MaskCOV), which integrates an auxiliary self-supervised learning module with a powerful in-image data augmentation scheme for the ultra-FGVC. Specifically, we first uniformly partition input images into patches and then augment data by randomly shuffling and masking these patches. On top of that, we introduce an auxiliary self-supervised learning module of predicting the spatial covariance context of these patches to increase discriminability of our network for classification. Very encouraging experimental results of the proposed method in comparison with the state-of-the-art benchmarks demonstrate its superiority and potential of MaskCOV concept, which pushes research boundary forward from the fine-grained to the ultra-fine-grained visual categorization.
KW - Ultra-fine-grained visual categorization
KW - Fine-grained visual categorization
KW - Covariance matrix
KW - Self-supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85107923260&partnerID=8YFLogxK
UR - http://purl.org/au-research/grants/arc/DP180100958
UR - http://purl.org/au-research/grants/arc/IH180100002
U2 - 10.1016/j.patcog.2021.108067
DO - 10.1016/j.patcog.2021.108067
M3 - Article
SN - 0031-3203
VL - 119
SP - 1
EP - 12
JO - Pattern Recognition
JF - Pattern Recognition
M1 - 108067
ER -