Robust variable selection under cellwise contamination

Peng Su*, Garth Tarr, Samuel Muller

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)
117 Downloads (Pure)

Abstract

Cellwise outliers are widespread in real world data analysis. Traditional robust methods may fail when applied to datasets under such contamination. We introduce a variable selection procedure, that uses the Gaussian rank estimator to obtain an initial empirical covariance matrix among the response and potential predictors. We re-parameterize the classical linear regression model design matrix and the response vector such that we are able to take advantage of these robustly estimated components before applying the adaptive Lasso to obtain consistent variable selection results. The procedure is robust to cellwise outliers in low and high-dimensional settings. Empirical results show good performance compared with recently proposed robust techniques, particularly in the challenging environment when contamination rates are high but the magnitude of outliers is moderate.

Original languageEnglish
Pages (from-to)1371-1387
Number of pages17
JournalJournal of Statistical Computation and Simulation
Volume94
Issue number6
Early online date28 Nov 2023
DOIs
Publication statusPublished - 2024

Bibliographical note

Copyright © 2023 Informa UK Limited, trading as Taylor & Francis Group. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.

Keywords

  • Cellwise contamination
  • Gaussian rank correlation
  • robust covariance
  • robust variable selection

Fingerprint

Dive into the research topics of 'Robust variable selection under cellwise contamination'. Together they form a unique fingerprint.

Cite this