An evaluation of methods for imputation of missing trace element data in groundwaters

Bruce L. Dickson*, Angela M. Giblin

*Corresponding author for this work

    Research output: Contribution to journalArticle

    20 Citations (Scopus)

    Abstract

    Groundwater data-sets with pH and major cation-anion chemistry are widely available but data that include trace metals are much rarer. This paper examines two methods of data imputation to predict U concentrations using pH, major cations, anions and F in a data-set where some of the U concentrations are missing. The methods evaluated were self-organizing maps (SOM) and expectation maximization (EM). Evaluations were made using a groundwater data-set of 187 samples from NSW and Victoria, which contained a wide range of U concentrations up to 225 μg/l. Tests made by setting 25% and 50% of the U concentrations to missing showed that, at 25% missing, SOM gave reasonable estimates, identifying all the samples with higher U. EM did not clearly identify the higher samples. At 50% missing, neither method could accurately identify the higher U concentrations. Thus, imputation using samples with missing data included in the training data-set does not appear to be practical. However, a SOM pre-trained on a data-set with no missing U concentrations may be used to impute U concentrations for samples with 100% missing U data. Training using the original data-set and then imputing concentrations for a second set of 360 samples showed that the samples with higher measured U concentrations could generally be identified, but that other samples were also estimated to be U-rich. This method could substantially reduce the number of samples in a large data-set requiring further investigation. The performance of imputation for U reflects the complex interaction of water chemistry, geology and mineralogy that actually determines the U concentrations. Imputation is a useful method for improving estimates of data statistics. SOM, through its model-free approach, is a useful addition to the numerical analysis toolbox for geochemists.

    Original languageEnglish
    Pages (from-to)173-178
    Number of pages6
    JournalGeochemistry: Exploration, Environment, Analysis
    Volume7
    Issue number2
    DOIs
    Publication statusPublished - May 2007

    Keywords

    • Evaporation ponds
    • Expectation maximization
    • Groundwater
    • Imputation
    • Murray Basin
    • Self-organizing map
    • Uranium

    Fingerprint Dive into the research topics of 'An evaluation of methods for imputation of missing trace element data in groundwaters'. Together they form a unique fingerprint.

  • Cite this