TY - GEN
T1 - Automating DBSCAN via Deep Reinforcement Learning
AU - Zhang, Ruitong
AU - Peng, Hao
AU - Dou, Yingtong
AU - Wu, Jia
AU - Sun, Qingyun
AU - Li, Yangyang
AU - Zhang, Jingyi
AU - Yu, Philip S.
PY - 2022/10
Y1 - 2022/10
N2 - DBSCAN is widely used in many scientific and engineering fields because of its simplicity and practicality. However, due to its high sensitivity parameters, the accuracy of the clustering result depends heavily on practical experience. In this paper, we first propose a novel Deep Reinforcement Learning guided automatic DBSCAN parameters search framework, namely DRL-DBSCAN. The framework models the process of adjusting the parameter search direction by perceiving the clustering environment as a Markov decision process, which aims to find the best clustering parameters without manual assistance. DRL-DBSCAN learns the optimal clustering parameter search policy for different feature distributions via interacting with the clusters, using a weakly-supervised reward training policy network. In addition, we also present a recursive search mechanism driven by the scale of the data to efficiently and controllably process large parameter spaces. Extensive experiments are conducted on five artificial and real-world datasets based on the proposed four working modes. The results of offline and online tasks show that the DRL-DBSCAN not only consistently improves DBSCAN clustering accuracy by up to 26% and 25% respectively, but also can stably find the dominant parameters with high computational efficiency. The code is available at https://github.com/RingBDStack/DRL-DBSCAN.
AB - DBSCAN is widely used in many scientific and engineering fields because of its simplicity and practicality. However, due to its high sensitivity parameters, the accuracy of the clustering result depends heavily on practical experience. In this paper, we first propose a novel Deep Reinforcement Learning guided automatic DBSCAN parameters search framework, namely DRL-DBSCAN. The framework models the process of adjusting the parameter search direction by perceiving the clustering environment as a Markov decision process, which aims to find the best clustering parameters without manual assistance. DRL-DBSCAN learns the optimal clustering parameter search policy for different feature distributions via interacting with the clusters, using a weakly-supervised reward training policy network. In addition, we also present a recursive search mechanism driven by the scale of the data to efficiently and controllably process large parameter spaces. Extensive experiments are conducted on five artificial and real-world datasets based on the proposed four working modes. The results of offline and online tasks show that the DRL-DBSCAN not only consistently improves DBSCAN clustering accuracy by up to 26% and 25% respectively, but also can stably find the dominant parameters with high computational efficiency. The code is available at https://github.com/RingBDStack/DRL-DBSCAN.
KW - deep reinforcement learning
KW - density-based clustering
KW - hyperparameter search
KW - recursive mechanism
UR - http://www.scopus.com/inward/record.url?scp=85140826480&partnerID=8YFLogxK
U2 - 10.1145/3511808.3557245
DO - 10.1145/3511808.3557245
M3 - Conference proceeding contribution
AN - SCOPUS:85140826480
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 2620
EP - 2630
BT - CIKM 2022 - Proceedings of the 31st ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery
CY - New York, NY
T2 - 31st ACM International Conference on Information and Knowledge Management, CIKM 2022
Y2 - 17 October 2022 through 21 October 2022
ER -