A machine learning predictive model to detect water quality and pollution

Xiaoting Xu, Tin Lai, Sayka Jahan, Farnaz Farid*, Abubakar Bello

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    13 Citations (Scopus)
    268 Downloads (Pure)

    Abstract

    The increasing prevalence of marine pollution during the past few decades motivated recent research to help ease the situation. Typical water quality assessment requires continuous monitoring of water and sediments at remote locations with labour-intensive laboratory tests to determine the degree of pollution. We propose an automated water quality assessment framework where we formalise a predictive model using machine learning to infer the water quality and level of pollution using collected water and sediments samples. Firstly, due to the sparsity of sample collection locations, the amount of sediment samples of water is limited, and the dataset is incomplete. Therefore, after an extensive investigation on various data imputation methods’ performance in water and sediment datasets with different missing data rates, we chose the best imputation method to process the missing data. Afterwards, the water sediment sample will be tagged as one of four levels of pollution based on some guidelines and then the machine learning model will use a specific technique named classification to find the relationship between the data and the final result. After that, the result of prediction can be compared to the real result so that it can be checked whether the model is good and whether the prediction is accurate. Finally, the research gave improvement advice based on the result obtained from the model building part. Empirically, we show that our best model archives an accuracy of 75% after accounting for 57% of missing data. Experimentally, we show that our model would assist in automatically assessing water quality screening based on possibly incomplete real-world data.

    Original languageEnglish
    Article number324
    Pages (from-to)1-14
    Number of pages14
    JournalFuture Internet
    Volume14
    Issue number11
    DOIs
    Publication statusPublished - Nov 2022

    Bibliographical note

    Copyright the Author(s) 2022. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.

    Keywords

    • artificial intelligence
    • data imputation
    • deep learning model
    • machine learning model
    • marine pollution
    • water pollution

    Fingerprint

    Dive into the research topics of 'A machine learning predictive model to detect water quality and pollution'. Together they form a unique fingerprint.

    Cite this