TY - JOUR
T1 - Application of feature selection methods and machine learning algorithms for saltmarsh biomass estimation using Worldview-2 imagery
AU - Rasel, Sikdar M. M.
AU - Chang, Hsing-Chung
AU - Ralph, Timothy J.
AU - Saintilan, Neil
AU - Diti, Israt Jahan
PY - 2021
Y1 - 2021
N2 - Assessing large scale plant productivity of coastal marshes is essential to understand the resilience of these systems to climate change. Two machine learning approaches, random forest (RF) and support vector machine (SVM) regression were tested to estimate biomass of a common saltmarshes species, salt couch grass (Sporobolus virginicus). Reflectance and vegetation indices derived from 8 bands of Worldview-2 multispectral data were used for four experiments to develop the biomass model. These four experiments were, Experiment-1: 8 bands of Worldview-2 image, Experiment-2: Possible combination of all bands of Worldview-2 for Normalized Difference Vegetation Index (NDVI) type vegetation indices, Experiment-3: Combination of bands and vegetation indices, Experiment-4: Selected variables derived from experiment-3 using variable selection methods. The main objectives of this study are (i) to recommend an affordable low cost data source to predict biomass of a common saltmarshes species, (ii) to suggest a variable selection method suitable for multispectral data, (iii) to assess the performance of RF and SVM for the biomass prediction model. Cross-validation of parameter optimizations for SVM showed that optimized parameter of ɛ-SVR failed to provide a reliable prediction. Hence, ν-SVR was used for the SVM model. Among the different variable selection methods, recursive feature elimination (RFE) selected a minimum number of variables (only 4) with an RMSE of 0.211 (kg/m2). Experiment-4 (only selected bands) provided the best results for both of the machine learning regression methods, RF (R2= 0.72, RMSE= 0.166 kg/m2) and SVR (R2= 0.66, RMSE = 0.200 kg/m2) to predict biomass. When a 10-fold cross validation of the RF model was compared with a 10-fold cross validation of SVR, a significant difference (p = <0.0001) was observed for RMSE. One to one comparisons of actual to predicted biomass showed that RF underestimates the high biomass values, whereas SVR overestimates the values; this suggests a need for further investigation and refinement.
AB - Assessing large scale plant productivity of coastal marshes is essential to understand the resilience of these systems to climate change. Two machine learning approaches, random forest (RF) and support vector machine (SVM) regression were tested to estimate biomass of a common saltmarshes species, salt couch grass (Sporobolus virginicus). Reflectance and vegetation indices derived from 8 bands of Worldview-2 multispectral data were used for four experiments to develop the biomass model. These four experiments were, Experiment-1: 8 bands of Worldview-2 image, Experiment-2: Possible combination of all bands of Worldview-2 for Normalized Difference Vegetation Index (NDVI) type vegetation indices, Experiment-3: Combination of bands and vegetation indices, Experiment-4: Selected variables derived from experiment-3 using variable selection methods. The main objectives of this study are (i) to recommend an affordable low cost data source to predict biomass of a common saltmarshes species, (ii) to suggest a variable selection method suitable for multispectral data, (iii) to assess the performance of RF and SVM for the biomass prediction model. Cross-validation of parameter optimizations for SVM showed that optimized parameter of ɛ-SVR failed to provide a reliable prediction. Hence, ν-SVR was used for the SVM model. Among the different variable selection methods, recursive feature elimination (RFE) selected a minimum number of variables (only 4) with an RMSE of 0.211 (kg/m2). Experiment-4 (only selected bands) provided the best results for both of the machine learning regression methods, RF (R2= 0.72, RMSE= 0.166 kg/m2) and SVR (R2= 0.66, RMSE = 0.200 kg/m2) to predict biomass. When a 10-fold cross validation of the RF model was compared with a 10-fold cross validation of SVR, a significant difference (p = <0.0001) was observed for RMSE. One to one comparisons of actual to predicted biomass showed that RF underestimates the high biomass values, whereas SVR overestimates the values; this suggests a need for further investigation and refinement.
KW - Worldview-2
KW - salt couch
KW - spectral band
KW - vegetation indices
KW - variable selection
UR - http://www.scopus.com/inward/record.url?scp=85067594457&partnerID=8YFLogxK
U2 - 10.1080/10106049.2019.1624988
DO - 10.1080/10106049.2019.1624988
M3 - Article
AN - SCOPUS:85067594457
SN - 1010-6049
VL - 36
SP - 1075
EP - 1099
JO - Geocarto International
JF - Geocarto International
IS - 10
ER -