Abstract
Many complex diseases are thought to be caused by multiple genetic variants. Recent advances in genotyping technology allowed investi- gators of a complex disease to obtain data for a massive number of candidate genetic variants. Typically each candidate variant is tested individually for an association with the disease. We approach the problem as one of model selection for high dimensional data. We propose a method whereby penalised maximum likelihood estimation provides a reasonably sized set of variants for inclusion in our model. We then perform stepwise regression on this set of variants to arrive at our model. Penalised maximum likelihood estimation is performed with both the lasso and a more recently developed method known as the hyperlasso, with smoothing parameters chosen by cross-validation. The hyperlasso has a penalty function that favours sparser solutions but with less shrinkage of those variables that are included in the model, when compared to the lasso; however, this comes at extra com- putational cost. We apply the above method to a large genomic data set from a previously published mice obesity study and use resample model averaging to assess model performance.
Original language | English |
---|---|
Pages (from-to) | C364-C378 |
Number of pages | 15 |
Journal | ANZIAM Journal |
Volume | 52 |
DOIs | |
Publication status | Published - 2010 |
Externally published | Yes |
Event | Biennial Computational Techniques and Applications Conference (CTAC2010) (15th : 2010) - University of New South Wales, Sydney, Australia Duration: 28 Nov 2010 → 1 Dec 2010 Conference number: 15th |