Abstract
Cellwise outliers are widespread in real world data analysis. Traditional robust methods may fail when applied to datasets under such contamination. We introduce a variable selection procedure, that uses the Gaussian rank estimator to obtain an initial empirical covariance matrix among the response and potential predictors. We re-parameterize the classical linear regression model design matrix and the response vector such that we are able to take advantage of these robustly estimated components before applying the adaptive Lasso to obtain consistent variable selection results. The procedure is robust to cellwise outliers in low and high-dimensional settings. Empirical results show good performance compared with recently proposed robust techniques, particularly in the challenging environment when contamination rates are high but the magnitude of outliers is moderate.
Original language | English |
---|---|
Pages (from-to) | 1371-1387 |
Number of pages | 17 |
Journal | Journal of Statistical Computation and Simulation |
Volume | 94 |
Issue number | 6 |
Early online date | 28 Nov 2023 |
DOIs | |
Publication status | Published - 2024 |
Bibliographical note
Copyright © 2023 Informa UK Limited, trading as Taylor & Francis Group. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.Keywords
- Cellwise contamination
- Gaussian rank correlation
- robust covariance
- robust variable selection