Abstract
The concepts of sequential pattern mining have become a growing topic in data mining, finding a home most recently in the Internet of Things (IoT) where large volumes of data are presented by the second for analysis and knowledge extraction. One key topic within the realm of sequential pattern mining in HUSPM, short form for High-utility sequential pattern mining. HUSPM takes into account the fusion of utility and sequence factors to assist in the determination of sequential patterns of high utility from databases and data sources. That being said, almost all current existing literature focuses on only using a single machine to increase the of mining performance. In this work, we present a four-stage MapReduce framework that is solely based on the well-known Spark platform for use in HUSPM. This framework is shown to create of more efficient and faster mining performance for dealing with the large datasets. It consists of four phases such as initialization, mining, updating and generation phases to handle the big datasets based on the MapReduce framework running on the Spark platform. Experiments indicated that the designed model is capable to handle the very big datasets while the state-of-the-art algorithms can only achieved good performance in small datasets.
Original language | English |
---|---|
Pages (from-to) | 12669-12678 |
Number of pages | 10 |
Journal | IEEE Internet of Things Journal |
Volume | 8 |
Issue number | 16 |
Early online date | 25 Sept 2020 |
DOIs | |
Publication status | Published - 15 Aug 2021 |
Keywords
- Analytics
- Internet of Things (IoT)
- big data
- data mining
- edge computing
- efficient computation
- sequential patterns