Abstract
Visual attention advances object detection by attending neural networks
to object representations. While existing methods incorporate empirical
modules to empower network attention, we rethink attentive object
detection from the network learning perspective in this work. We propose
a NEural Attention Learning approach (NEAL) which consists of two
parts. During the back-propagation of each training iteration, we first
calculate the partial derivatives (
a.k.a
. the accumulated gradients) of the classification output with respect
to the input features. We refine these partial derivatives to obtain
attention response maps whose elements reflect the contributions to the
final network predictions. Then, we formulate the attention response
maps as extra objective functions, which are combined together with the
original detection loss to train detectors in an end-to-end manner. In
this way, we succeed in learning an attentive CNN model without
introducing additional network structures. We apply NEAL to the
two-stage object detection frameworks, which are usually composed of a
CNN feature backbone, a region proposal network (RPN), and a classifier.
We show that the proposed NEAL not only helps the RPN attend to objects
but also enables the classifier to pay more attention to the premier
positive samples. To this end, the localization (proposal generation)
and classification mutually benefit from each other in our proposed
method. Extensive experiments on large-scale benchmark datasets,
including MS COCO 2017 and Pascal VOC 2012, demonstrate that the
proposed NEAL algorithm advances the two-stage object detector over
state-of-the-art approaches.
Original language | English |
---|---|
Pages (from-to) | 1726-1739 |
Number of pages | 14 |
Journal | IEEE Transactions on Image Processing |
Volume | 33 |
Early online date | 18 Jul 2023 |
DOIs | |
Publication status | Published - 2024 |
Externally published | Yes |