TY - JOUR
T1 - Siamese local and global networks for robust face tracking
AU - Qi, Yuankai
AU - Zhang, Shengping
AU - Jiang, Feng
AU - Zhou, Huiyu
AU - Tao, Dacheng
AU - Li, Xuelong
PY - 2020
Y1 - 2020
N2 - Convolutional neural networks (CNNs) have achieved great success in several face-related tasks, such as face detection, alignment and recognition. As a fundamental problem in computer vision, face tracking plays a crucial role in various applications, such as video surveillance, human emotion detection and human-computer interaction. However, few CNN-based approaches are proposed for face (bounding box) tracking. In this article, we propose a face tracking method based on Siamese CNNs, which takes advantages of powerful representations of hierarchical CNN features learned from massive face images. The proposed method captures discriminative face information at both local and global levels. At the local level, representations for attribute patches (i.e., eyes, nose and mouth) are learned to distinguish a face from another one, which are robust to pose changes and occlusions. At the global level, representations for each whole face are learned, which take into account the spatial relationships among local patches and facial characters, such as skin color and nevus. In addition, we build a new large-scale challenging face tracking dataset to evaluate face tracking methods and to facilitate the research forward in this field. Extensive experiments on the collected dataset demonstrate the effectiveness of our method in comparison to several state-of-the-art visual tracking methods.
AB - Convolutional neural networks (CNNs) have achieved great success in several face-related tasks, such as face detection, alignment and recognition. As a fundamental problem in computer vision, face tracking plays a crucial role in various applications, such as video surveillance, human emotion detection and human-computer interaction. However, few CNN-based approaches are proposed for face (bounding box) tracking. In this article, we propose a face tracking method based on Siamese CNNs, which takes advantages of powerful representations of hierarchical CNN features learned from massive face images. The proposed method captures discriminative face information at both local and global levels. At the local level, representations for attribute patches (i.e., eyes, nose and mouth) are learned to distinguish a face from another one, which are robust to pose changes and occlusions. At the global level, representations for each whole face are learned, which take into account the spatial relationships among local patches and facial characters, such as skin color and nevus. In addition, we build a new large-scale challenging face tracking dataset to evaluate face tracking methods and to facilitate the research forward in this field. Extensive experiments on the collected dataset demonstrate the effectiveness of our method in comparison to several state-of-the-art visual tracking methods.
UR - http://www.scopus.com/inward/record.url?scp=85092564129&partnerID=8YFLogxK
U2 - 10.1109/TIP.2020.3023621
DO - 10.1109/TIP.2020.3023621
M3 - Article
C2 - 32941139
AN - SCOPUS:85092564129
SN - 1057-7149
VL - 29
SP - 9152
EP - 9164
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
ER -