TY - JOUR
T1 - Detecting technical anomalies in high-frequency water-quality data using Artificial Neural Networks
AU - Rodriguez-Perez, Javier
AU - Leigh, Catherine
AU - Liquet, Benoit
AU - Kermorvant, Claire
AU - Peterson, Erin
AU - Sous, Damien
AU - Mengersen, Kerrie
PY - 2020/11/3
Y1 - 2020/11/3
N2 - Anomaly detection (AD) in high-volume environmental data requires one to tackle a series of challenges associated with the typical low frequency of anomalous events, the broad-range of possible anomaly types and local non-stationary environmental conditions, suggesting the need for flexible statistical methods that are able to cope with unbalanced high-volume data problems. Here, we aimed to detect anomalies caused by technical errors in water-quality (turbidity and conductivity) data collected by automated in-situ sensors deployed in contrasting riverine and estuarine environments. We first applied a range of Artificial Neural Networks (ANN) that differed in both learning method and hyper-parameter values, then calibrated models using a Bayesian multi-objective optimisation procedure, and selected and evaluated the "best" model for each water-quality variable, environment and anomaly type. We found that semi-supervised classification was better able to detect sudden spikes, sudden shifts and small sudden spikes whereas supervised classification had higher accuracy for predicting long-term anomalies associated with drifts and periods of otherwise unexplained high-variability.
AB - Anomaly detection (AD) in high-volume environmental data requires one to tackle a series of challenges associated with the typical low frequency of anomalous events, the broad-range of possible anomaly types and local non-stationary environmental conditions, suggesting the need for flexible statistical methods that are able to cope with unbalanced high-volume data problems. Here, we aimed to detect anomalies caused by technical errors in water-quality (turbidity and conductivity) data collected by automated in-situ sensors deployed in contrasting riverine and estuarine environments. We first applied a range of Artificial Neural Networks (ANN) that differed in both learning method and hyper-parameter values, then calibrated models using a Bayesian multi-objective optimisation procedure, and selected and evaluated the "best" model for each water-quality variable, environment and anomaly type. We found that semi-supervised classification was better able to detect sudden spikes, sudden shifts and small sudden spikes whereas supervised classification had higher accuracy for predicting long-term anomalies associated with drifts and periods of otherwise unexplained high-variability.
UR - http://www.scopus.com/inward/record.url?scp=85095461630&partnerID=8YFLogxK
U2 - 10.1021/acs.est.0c04069
DO - 10.1021/acs.est.0c04069
M3 - Article
C2 - 32856893
SN - 1520-5851
VL - 54
SP - 13719
EP - 13730
JO - Environmental Science and Technology
JF - Environmental Science and Technology
IS - 21
ER -