The Naïve Overfitting Index Selection (NOIS): a new method to optimize model complexity for hyperspectral data

Alby D. Rocha*, Thomas A. Groen, Andrew K. Skidmore, Roshanak Darvishzadeh, Louise Willemen

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    8 Citations (Scopus)

    Abstract

    The growing number of narrow spectral bands in hyperspectral remote sensing improves the capacity to describe and predict biological processes in ecosystems. But it also poses a challenge to fit empirical models based on such high dimensional data, which often contain correlated and noisy predictors. As sample sizes, to train and validate empirical models, seem not to be increasing at the same rate, overfitting has become a serious concern. Overly complex models lead to overfitting by capturing more than the underlying relationship, and also through fitting random noise in the data. Many regression techniques claim to overcome these problems by using different strategies to constrain complexity, such as limiting the number of terms in the model, by creating latent variables or by shrinking parameter coefficients. This paper is proposing a new method, named Naïve Overfitting Index Selection (NOIS), which makes use of artificially generated spectra, to quantify the relative model overfitting and to select an optimal model complexity supported by the data. The robustness of this new method is assessed by comparing it to a traditional model selection based on cross-validation. The optimal model complexity is determined for seven different regression techniques, such as partial least squares regression, support vector machine, artificial neural network and tree-based regressions using five hyperspectral datasets. The NOIS method selects less complex models, which present accuracies similar to the cross-validation method. The NOIS method reduces the chance of overfitting, thereby avoiding models that present accurate predictions that are only valid for the data used, and too complex to make inferences about the underlying process.

    Original languageEnglish
    Pages (from-to)61-74
    Number of pages14
    JournalISPRS Journal of Photogrammetry and Remote Sensing
    Volume133
    DOIs
    Publication statusPublished - Nov 2017

    Keywords

    • Remote sensing
    • Model tuning
    • Cross-validation
    • Prediction accuracy
    • Dimensionality
    • Multicollinearity

    Fingerprint Dive into the research topics of 'The Naïve Overfitting Index Selection (NOIS): a new method to optimize model complexity for hyperspectral data'. Together they form a unique fingerprint.

    Cite this