Abstract
Biomedical studies of neuroimaging and genomics collect large amounts of data on a small subset of subjects so as to not miss informative predictors. An important goal is identifying those predictors that provide better visualization of the data and that could serve as cost-effective measures for future clinical trials. Identifying such predictors is challenging, however, when the predictors are naturally interrelated and the response is a failure time prone to censoring. We propose to handle these challenges with a novel variable selection technique. Our approach casts the problem into several smaller dimensional settings and extracts from this intermediary step the relative importance of each predictor through data-driven weights called exclusion frequencies. The exclusion frequencies are used as weights in a weighted Lasso, and results yield low false discovery rates and a high geometric mean of sensitivity and specificity. We illustrate the method’s advantages over existing ones in an extensive simulation study, and use the method to identify relevant neuroimaging markers associated with Huntington’s disease onset.
Original language | English |
---|---|
Pages (from-to) | 2130-2156 |
Number of pages | 27 |
Journal | Annals of Applied Statistics |
Volume | 10 |
Issue number | 4 |
DOIs | |
Publication status | Published - Dec 2016 |
Externally published | Yes |
Keywords
- Exclusion frequency
- Model selection
- Neuroimaging
- Proportional hazards model
- Weighted lasso