Are we evaluating rigorously? Benchmarking recommendation for reproducible evaluation and fair comparison

Zhu Sun, Di Yu, Hui Fang*, Jie Yang, Xinghua Qu, Jie Zhang, Cong Geng

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contribution

Abstract

With tremendous amount of recommendation algorithms proposed every year, one critical issue has attracted a considerable amount of attention: there are no effective benchmarks for evaluation, which leads to two major concerns, i.e., unreproducible evaluation and unfair comparison. This paper aims to conduct rigorous (i.e., reproducible and fair) evaluation for implicit-feedback based top-N recommendation algorithms. We first systematically review 85 recommendation papers published at eight top-tier conferences (e.g., RecSys, SIGIR) to summarize important evaluation factors, e.g., data splitting and parameter tuning strategies, etc. Through a holistic empirical study, the impacts of different factors on recommendation performance are then analyzed in-depth. Following that, we create benchmarks with standardized procedures and provide the performance of seven well-tuned state-of-the-arts across six metrics on six widely-used datasets as a reference for later study. Additionally, we release a user-friendly Python toolkit, which differs from existing ones in addressing the broad scope of rigorous evaluation for recommendation. Overall, our work sheds light on the issues in recommendation evaluation and lays the foundation for further investigation. Our code and datasets are available at GitHub (https://github.com/AmazingDD/daisyRec).
Original languageEnglish
Title of host publication14th ACM Conference on Recommender Systems (RecSys)
Place of PublicationNew York, NY
PublisherAssociation for Computing Machinery (ACM)
Pages23-32
Number of pages10
ISBN (Electronic)9781450375832
DOIs
Publication statusPublished - 2020
EventACM Conference on Recommender Systems (14th : 2020)` - Virtual, Brazil
Duration: 22 Sep 202025 Sep 2020
https://recsys.acm.org/recsys20/

Conference

ConferenceACM Conference on Recommender Systems (14th : 2020)`
Abbreviated titleRecSys
CountryBrazil
Period22/09/2025/09/20
Internet address

Keywords

  • Recommender Systems
  • Reproducible Evaluation
  • Benchmarks

Fingerprint Dive into the research topics of 'Are we evaluating rigorously? Benchmarking recommendation for reproducible evaluation and fair comparison'. Together they form a unique fingerprint.

Cite this