Abstract
Integrative analysis of high dimensional omics datasets has been studied by many authors in recent years. By incorporating prior known relationships among the variables, these analyses have been successful in elucidating the relationships between different sets of omics data. In this article, our goal is to identify important relationships between genomic expression and cytokine data from a human immunodeficiency virus vaccine trial. We proposed a flexible partial least squares technique, which incorporates group and subgroup structure in the modelling process. Our new method accounts for both grouping of genetic markers (eg, gene sets) and temporal effects. The method generalises existing sparse modelling techniques in the partial least squares methodology and establishes theoretical connections to variable selection methods for supervised and unsupervised problems. Simulation studies are performed to investigate the performance of our methods over alternative sparse approaches. Our R package sgspls is available at https://github.com/matt-sutton/sgspls.
Original language | English |
---|---|
Pages (from-to) | 3338-3356 |
Number of pages | 19 |
Journal | Statistics in Medicine |
Volume | 37 |
Issue number | 23 |
DOIs | |
Publication status | Published - 15 Oct 2018 |
Externally published | Yes |
Keywords
- feature selection
- group variable selection
- latent variable modelling
- partial least squares