Joint discriminative representation learning for end-to-end person search

Pengcheng Zhang, Xiaohan Yu, Xiao Bai*, Chen Wang, Jin Zheng, Xin Ning

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

23 Citations (Scopus)

Abstract

Person search simultaneously detects and retrieves a query person from uncropped scene images. Existing methods are either two-step or end-to-end. The former employs two standalone models for the two sub-tasks, while the latter conducts person search with a unified model. Despite encouraging progress, most existing end-to-end methods focus on balancing the model between detection and retrieval sub-tasks, while ignoring to enhance the learned representation for retrieval, which leads to inferior accuracy to two-step approaches. To that end, we propose a novel hierarchical framework that jointly optimizes instance-aware and part-aware embedding to enable discriminative representation learning. Specifically, we develop a region-of-interest cosegment (ROICoseg) module that captures part-aware information without requiring extra annotations to enable fine-grained discriminative representation. On top of that, a Contextual Instance Batch Sampling (CIBS) method is introduced to effectively employ contextual information for constructing training batches, thus facilitating effective instance-aware representation learning. We further introduce the first cross-door person search dataset (CDPS) that retrieves a target person in outdoor cameras with an indoor captured image or vice versa. Extensive experiments show that our proposed model achieves competitive performance on CUHK-SYSU and outperforms state-of-the-art end-to-end methods on the more challenging PRW and CDPS.1

Original languageEnglish
Article number110053
Pages (from-to)1-11
Number of pages11
JournalPattern Recognition
Volume147
DOIs
Publication statusPublished - Mar 2024

Keywords

  • Person search
  • Person re-identification
  • Part segmentation
  • Batch sampling

Fingerprint

Dive into the research topics of 'Joint discriminative representation learning for end-to-end person search'. Together they form a unique fingerprint.

Cite this