ForeXGBoost: passenger car sales prediction based on XGBoost

Zhenchang Xia, Shan Xue, Libing Wu*, Jiaxin Sun, Yanjiao Chen, Rui Zhang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)


The rapid development of machine learning has spurred wide applications to various industries, where prediction models are built to forecast sales to help enterprises and governments make better plans. Alibaba Cloud and the Yancheng Municipal Government held a competition in 2018, calling for global efforts to build machine learning models that can accurately forecast vehicle sales based on large-scale datasets. This paper presents the design, implementation and evaluation of ForeXGBoost, and our proposed model that won the first place in the competition. ForeXGBoost takes full advantage of carefully-designed data filling algorithms for missing values to improve data quality. By using the sliding window to extract historical sales and production data features, ForeXGBoost can improve prediction accuracy. An extensive study is conducted to evaluate the influence of different attributes on vehicle sales via information gain and data correlation, based on which we select the most indicative features from the feature set for prediction. Furthermore, we leverage the XGBoost prediction algorithm to achieve a high prediction accuracy with short running time for vehicle sales prediction. Extensive experiments confirm that ForeXGBoost can achieve a high prediction accuracy with a low overhead.

Original languageEnglish
Pages (from-to)713-738
Number of pages26
JournalDistributed and Parallel Databases
Issue number3
Publication statusPublished - Sep 2020


  • Vehicle sales prediction
  • Feature selection
  • XGBoost model

Fingerprint Dive into the research topics of 'ForeXGBoost: passenger car sales prediction based on XGBoost'. Together they form a unique fingerprint.

Cite this