TY - JOUR
T1 - Empirical performance of cross-validation with oracle methods in a genomics context
AU - Martinez, Josue G.
AU - Carroll, Raymond J.
AU - Müller, Samuel
AU - Sampson, Joshua N.
AU - Chatterjee, Nilanjan
PY - 2011/11
Y1 - 2011/11
N2 - When employing model selection methods with oracle properties such as the smoothly clipped absolute deviation (SCAD) and the Adaptive Lasso, it is typical to estimate the smoothing parameter by m-fold cross-validation, for example, m = 10. In problems where the true regression function is sparse and the signals large, such cross-validation typically works well. However, in regression modeling of genomic studies involving Single Nucleotide Polymorphisms (SNP), the true regression functions, while thought to be sparse, do not have large signals. We demonstrate empirically that in such problems, the number of selected variables using SCAD and the Adaptive Lasso, with 10-fold cross-validation, is a random variable that has considerable and surprising variation. Similar remarks apply to nonoracle methods such as the Lasso. Our study strongly questions the suitability of performing only a single run of m-fold crossvalidation with any oracle method, and not just the SCAD and Adaptive Lasso.
AB - When employing model selection methods with oracle properties such as the smoothly clipped absolute deviation (SCAD) and the Adaptive Lasso, it is typical to estimate the smoothing parameter by m-fold cross-validation, for example, m = 10. In problems where the true regression function is sparse and the signals large, such cross-validation typically works well. However, in regression modeling of genomic studies involving Single Nucleotide Polymorphisms (SNP), the true regression functions, while thought to be sparse, do not have large signals. We demonstrate empirically that in such problems, the number of selected variables using SCAD and the Adaptive Lasso, with 10-fold cross-validation, is a random variable that has considerable and surprising variation. Similar remarks apply to nonoracle methods such as the Lasso. Our study strongly questions the suitability of performing only a single run of m-fold crossvalidation with any oracle method, and not just the SCAD and Adaptive Lasso.
KW - Adaptive lasso
KW - Lasso
KW - Model selection
KW - Oracle estimation
UR - http://www.scopus.com/inward/record.url?scp=84856050251&partnerID=8YFLogxK
UR - http://purl.org/au-research/grants/arc/DP11010199
U2 - 10.1198/tas.2011.11052
DO - 10.1198/tas.2011.11052
M3 - Article
C2 - 22347720
AN - SCOPUS:84856050251
VL - 65
SP - 223
EP - 228
JO - American Statistician
JF - American Statistician
SN - 0003-1305
IS - 4
ER -