TY - GEN
T1 - Multi-modal fusion for flasher detection in a mobile video chat application
AU - Tian, Lei
AU - Rafiq, Rahat
AU - Li, Shaosong
AU - Chu, David
AU - Han, Richard
AU - Lv, Qin
AU - Mishra, Shivakant
PY - 2014
Y1 - 2014
N2 - This paper investigates the development of accurate and efficient classifiers to identify misbehaving users (i.e., "flashers") in a mobile video chat application. Our analysis is based on video session data collected from a mobile client that we built that connects to a popular random video chat service. We show that prior imagebased classifiers designed for identifying normal and misbehaving users in online video chat systems perform poorly on mobile video chat data. We present an enhanced image-based classifier that improves classification performance on mobile data. More importantly, we demonstrate that incorporating multi-modal mobile sensor data from accelerometer and the camera state (front/back) along with audio can significantly improve the overall image-based classification accuracy. Our work also shows that leveraging multiple image-based predictions within a session (i.e., temporal modality) has the potential to further improve the classification performance. Finally, we show that the cost of classification in terms of running time can be significantly reduced by employing a multilevel cascaded classifier in which high-complexity features and further image-based predictions are not generated unless needed.
AB - This paper investigates the development of accurate and efficient classifiers to identify misbehaving users (i.e., "flashers") in a mobile video chat application. Our analysis is based on video session data collected from a mobile client that we built that connects to a popular random video chat service. We show that prior imagebased classifiers designed for identifying normal and misbehaving users in online video chat systems perform poorly on mobile video chat data. We present an enhanced image-based classifier that improves classification performance on mobile data. More importantly, we demonstrate that incorporating multi-modal mobile sensor data from accelerometer and the camera state (front/back) along with audio can significantly improve the overall image-based classification accuracy. Our work also shows that leveraging multiple image-based predictions within a session (i.e., temporal modality) has the potential to further improve the classification performance. Finally, we show that the cost of classification in terms of running time can be significantly reduced by employing a multilevel cascaded classifier in which high-complexity features and further image-based predictions are not generated unless needed.
UR - http://www.scopus.com/inward/record.url?scp=84924336233&partnerID=8YFLogxK
U2 - 10.4108/icst.mobiquitous.2014.257973
DO - 10.4108/icst.mobiquitous.2014.257973
M3 - Conference proceeding contribution
T3 - MobiQuitous 2014 - 11th International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services
SP - 267
EP - 276
BT - MobiQuitous 2014 - 11th International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services
PB - ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering)
CY - Gent, Belgium
T2 - 11th International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services, MobiQuitous 2014
Y2 - 2 December 2014 through 5 December 2014
ER -