TY - GEN
T1 - PGTRNet
T2 - 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022
AU - Wang, Jun
AU - Zhou, Hefeng
AU - Yu, Xiaohan
PY - 2022
Y1 - 2022
N2 - Current state-of-the-art weakly supervised object detection (WSOD) studies mainly follow a two-stage training strategy which integrates a fully supervised detector (FSD) with a pure WSOD model. There are two main problems hindering the performance of the two-phase WSOD approaches, i.e., insufficient learning problem and strict reliance between the FSD and the pseudo ground truth (PGT) generated by the WSOD model. This paper proposes pseudo ground truth refinement network (PGTRNet), a simple yet effective method without introducing any extra learnable parameters, to cope with these problems. PGTRNet utilizes multiple bounding boxes to establish the PGT, mitigating the insufficient learning problem. Besides, we propose a novel online PGT refinement approach to steadily improve the quality of PGT by fully taking advantage of the power of FSD during the second-phase training, decoupling the first and second-phase models. Elaborate experiments are conducted on the PASCAL VOC 2007 benchmark to verify the effectiveness of our methods. Experimental results demonstrate that PGTRNet boosts the backbone model by 2.1% mAP and achieves the state-of-the-art performance.
AB - Current state-of-the-art weakly supervised object detection (WSOD) studies mainly follow a two-stage training strategy which integrates a fully supervised detector (FSD) with a pure WSOD model. There are two main problems hindering the performance of the two-phase WSOD approaches, i.e., insufficient learning problem and strict reliance between the FSD and the pseudo ground truth (PGT) generated by the WSOD model. This paper proposes pseudo ground truth refinement network (PGTRNet), a simple yet effective method without introducing any extra learnable parameters, to cope with these problems. PGTRNet utilizes multiple bounding boxes to establish the PGT, mitigating the insufficient learning problem. Besides, we propose a novel online PGT refinement approach to steadily improve the quality of PGT by fully taking advantage of the power of FSD during the second-phase training, decoupling the first and second-phase models. Elaborate experiments are conducted on the PASCAL VOC 2007 benchmark to verify the effectiveness of our methods. Experimental results demonstrate that PGTRNet boosts the backbone model by 2.1% mAP and achieves the state-of-the-art performance.
UR - http://www.scopus.com/inward/record.url?scp=85131247600&partnerID=8YFLogxK
U2 - 10.1109/ICASSP43922.2022.9746625
DO - 10.1109/ICASSP43922.2022.9746625
M3 - Conference proceeding contribution
SN - 9781665405416
SP - 2245
EP - 2249
BT - 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing
PB - Institute of Electrical and Electronics Engineers (IEEE)
CY - Piscataway, NJ
Y2 - 22 May 2022 through 27 May 2022
ER -