TY - JOUR
T1 - Fine-grained recognition of manipulation activities on objects via multi-modal sensing
AU - Liu, Xiulong
AU - Zhang, Bojun
AU - Wang, Lizhang
AU - Chen, Sheng
AU - Xie, Xin
AU - Tong, Xinyu
AU - Gu, Tao
AU - Li, Keqiu
PY - 2024/10
Y1 - 2024/10
N2 - Fine-grained recognition of human manipulation activities on objects is crucial in the era of human-computer-object integration. However, there is a lack of solutions for simultaneous recognition of human identity, manipulation activities (including drawing and rotation), and manipulated objects. Therefore, we propose an RF-Camera system that combines RFID and computer vision techniques to address this challenge in multi-person and multi-object scenarios. In RF-Camera, we employ a skeleton-assisted method to extract facial images of target individuals, enabling precise recognition of their identities. To identify manipulation activities, we analyze the 3D hand trajectory and fingertip vector angle, differentiating drawing and rotation manipulation activities. Additionally, we model target person's hand movements to predict phase data of the target tag, enabling the determination of person-object relationships. Implementing RF-Camera using COTS RFID and Kinect devices involves overcoming challenges such as extracting effective data from noisy streams, predicting virtual phase data considering hand-tag offset, and ensuring high tag reading rates in tag-dense scenarios. We conducted experiments involving six participants performing object manipulation activities, including drawing letters/symbols and rotating movements. Extensive experimental results show that RF-Camera achieves over 90% accuracy in recognizing person identity, manipulation activities, and person-object matching in most conditions.
AB - Fine-grained recognition of human manipulation activities on objects is crucial in the era of human-computer-object integration. However, there is a lack of solutions for simultaneous recognition of human identity, manipulation activities (including drawing and rotation), and manipulated objects. Therefore, we propose an RF-Camera system that combines RFID and computer vision techniques to address this challenge in multi-person and multi-object scenarios. In RF-Camera, we employ a skeleton-assisted method to extract facial images of target individuals, enabling precise recognition of their identities. To identify manipulation activities, we analyze the 3D hand trajectory and fingertip vector angle, differentiating drawing and rotation manipulation activities. Additionally, we model target person's hand movements to predict phase data of the target tag, enabling the determination of person-object relationships. Implementing RF-Camera using COTS RFID and Kinect devices involves overcoming challenges such as extracting effective data from noisy streams, predicting virtual phase data considering hand-tag offset, and ensuring high tag reading rates in tag-dense scenarios. We conducted experiments involving six participants performing object manipulation activities, including drawing letters/symbols and rotating movements. Extensive experimental results show that RF-Camera achieves over 90% accuracy in recognizing person identity, manipulation activities, and person-object matching in most conditions.
KW - computer vision
KW - human sensing
KW - multi-modal fusion
KW - object manipulation activities
KW - RFID
UR - http://www.scopus.com/inward/record.url?scp=85187241533&partnerID=8YFLogxK
U2 - 10.1109/TMC.2024.3364522
DO - 10.1109/TMC.2024.3364522
M3 - Article
AN - SCOPUS:85187241533
SN - 1536-1233
VL - 23
SP - 9614
EP - 9628
JO - IEEE Transactions on Mobile Computing
JF - IEEE Transactions on Mobile Computing
IS - 10
ER -