Evaluating the predictive power of different machine learning algorithms for groundwater salinity prediction of multi-layer coastal aquifers in the Mekong Delta, Vietnam

Dang An Tran*, Maki Tsujimura, Nam Thang Ha, Van Tam Nguyen, Doan Van Binh, Thanh Duc Dang, Quang-Van Doan, Dieu Tien Bui, Trieu Anh Ngoc, Le Vo Phu, Pham Thi Bich Thuc, Tien Dat Pham

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

50 Citations (Scopus)
69 Downloads (Pure)


Groundwater salinization is considered as a major environmental problem in worldwide coastal areas, influencing ecosystems and human health. However, an accurate prediction of salinity concentration in groundwater remains a challenge due to the complexity of groundwater salinization processes and its influencing factors. In this study, we evaluate state-of-the-art machine learning (ML) algorithms for predicting groundwater salinity and identify its influencing factors. We conducted a study for the coastal multi-layer aquifers of the Mekong River Delta (Vietnam), using a geodatabase of 216 groundwater samples and 14 conditioning factors. We compared the predictive performances of different ML techniques, i.e., the Random Forest Regression (RFR), the Extreme Gradient Boosting Regression (XGBR), the CatBoost Regression (CBR), and the Light Gradient Boosting Regression (LGBR) models. The model performance was assessed by using root-mean-square error (RMSE), coefficient of determination (R2), the Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC). The results show that the CBR model has the highest performance with both training (R2 = 0.999, RMSE = 29.90) and testing datasets (R2 = 0.84, RMSE = 205.96, AIC = 720.60, and BIC = 751.04). Ten of the 14 influencing factors, including the distance to saline sources, the depth of screen well, the groundwater level, the vertical hydraulic conductivity, the operation time, the well density, the extraction capacity, the thickness of the aquitard, the distance to fault, and the horizontal hydraulic conductivity are the most important factors for groundwater salinity prediction. The results provide insights for policymakers in proposing remediation and management strategies for groundwater salinity issues in the context of excessive groundwater exploitation in coastal lowland regions. Since the human-induced influencing factors have significantly influenced groundwater salinization, urgent actions should be taken into consideration to ensure sustainable groundwater management in the coastal areas of the Mekong River Delta.

Original languageEnglish
Article number107790
Pages (from-to)1-14
Number of pages14
JournalEcological Indicators
Publication statusPublished - Aug 2021
Externally publishedYes

Bibliographical note

Copyright the Author(s) 2021. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.


  • CatBoost Regression
  • Influencing factors
  • Groundwater salinization
  • Multi-layer coastal aquifers
  • Mekong Delta


Dive into the research topics of 'Evaluating the predictive power of different machine learning algorithms for groundwater salinity prediction of multi-layer coastal aquifers in the Mekong Delta, Vietnam'. Together they form a unique fingerprint.

Cite this