A recommendation engine for predicting movie ratings using a big data approach

Mazhar Javed Awan*, Rafia Asad Khan, Haitham Nobanee*, Awais Yasin, Syed Muhammad Anwar, Usman Naseem, Vishwa Pratab Singh

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

53 Citations (Scopus)


In this era of big data, the amount of video content has dramatically increased with an exponential broadening of video streaming services. Hence, it has become very strenuous for end-users to search for their desired videos. Therefore, to attain an accurate and robust clustering of information, a hybrid algorithm was used to introduce a recommender engine with collaborative filtering using Apache Spark and machine learning (ML) libraries. In this study, we implemented a movie recommendation system based on a collaborative filtering approach using the alternating least squared (ALS) model to predict the best-rated movies. Our proposed system uses the last search data of a user regarding movie category and references this to instruct the recommender engine, thereby making a list of predictions for top ratings. The proposed study used a model-based approach of matrix factorization, the ALS algorithm along with a collaborative filtering technique, which solved the cold start, sparse, and scalability problems. In particular, we performed experimental analysis and successfully obtained minimum root mean squared errors (oRMSEs) of 0.8959 to 0.97613, approximately. Moreover, our proposed movie recommendation system showed an accuracy of 97% and predicted the top 1000 ratings for movies.

Original languageEnglish
Article number1215
Pages (from-to)1-17
Number of pages17
Issue number10
Publication statusPublished - 2 May 2021
Externally publishedYes

Bibliographical note

Copyright the Author(s) 2021. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.


  • recommendation engine
  • Spark machine learning
  • filtering
  • collaborative filtering
  • RMSE
  • Pyspark
  • matrix factorization
  • oRMSE
  • ALS (alternating least squared)
  • Apache Spark
  • Spark ML Movielens dataset
  • Spark MLlib


Dive into the research topics of 'A recommendation engine for predicting movie ratings using a big data approach'. Together they form a unique fingerprint.

Cite this