YOLO-SA: an efficient object detection model based on self-attention mechanism

Ang Li, Xiangyu Song, ShiJie Sun, Zhaoyang Zhang*, Taotao Cai, Huansheng Song

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

Abstract

Object detector based on CNN structure has been widely used in object detection, object classification and other tasks. The traditional CNN module usually adopts complex multi-branch design, which reduces the reasoning speed and memory utilization. Moreover, in many works, attention mechanism is usually added to the object detector to extract rich features in spatial information, which are usually used as additional modules of convolution without fundamental improvement from the limitations of convolution operation. Finally, traditional object detectors often have coupled detection heads, which can compromise model performance. To solve the above problems, we propose a new object detection model, YOLO-SA, based on the current popular object detector model YOLOv5. We introduce a new reparameterized module RepVGG to replace the original DarkNet53 structure of YOLOv5 model, which greatly reduces the complexity of the model and improves the detection accuracy. We introduce a self-attention mechanism module in the feature fusion part of the model, which is independent from other convolutional layers and has higher performance than other mainstream attention mechanism modules. We replace the coupled detection head in YOLOv5 model with an anchor-based decoupled detection head, which greatly improved the convergence speed in the training process. Experiments show that the detection accuracy of the YOLO-SA model proposed by us reaches 71.2% and 75.8% on COCO2014 and VOC2012 dataset respectively, which is superior to the YOLOv5s model as the baseline and other mainstream object detection models, showing certain superiority.
Original languageEnglish
Title of host publicationWeb and big data
Subtitle of host publication7th International Joint Conference, APWeb-WAIM 2023: proceedings, part IV
EditorsXiangyu Song, Ruyi Feng, Yunliang Chen, Jianxin Li, Geyong Min
Place of PublicationSingapore
PublisherSpringer Nature
Chapter1
Pages1-15
Number of pages15
ISBN (Electronic)9789819724215
ISBN (Print)9789819724208
DOIs
Publication statusPublished - 2024
EventAsia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) International Joint Conference on Web and Big Data (7th : 2023) - Wuhan, China
Duration: 6 Oct 20238 Oct 2023

Publication series

NameLecture Notes in Computer Science
Number14334
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceAsia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) International Joint Conference on Web and Big Data (7th : 2023)
Abbreviated titleAPWeb-WAIM 2023
Country/TerritoryChina
CityWuhan
Period6/10/238/10/23

Keywords

  • attention mechanism
  • CNN architecture
  • decoupled detection head
  • object detection

Fingerprint

Dive into the research topics of 'YOLO-SA: an efficient object detection model based on self-attention mechanism'. Together they form a unique fingerprint.

Cite this