Technological advances in molecular biology over the past decade have given rise tohigh dimensional and complex datasets offering the possibility to investigate biologicalassociations between a range of genomic features and complex phenotypes. The analysisof this novel type of data generated unprecedented computational challenges which ultimatelyled to the definition and implementation of computationally efficient statisticalmodels that were able to scale to genome-wide data, including Bayesian variable selectionapproaches. While extensive methodological work has been carried out in this area, onlyfew methods capable of handling hundreds of thousands of predictors were implementedand distributed. Among these we recently proposed GUESS, a computationally optimize dalgorithm making use of graphics processing unit capabilities, which can accommo date multiple outcomes. In this paper we propose R2GUESS, an R package wrapping theoriginal C++ source code. In addition to providing a user-friendly interface of the originalcode automating its parametrisation, and data handling, R2GUESS also incorporatesmany features to explore the data, to extend statistical inferences from the native algorithm(e.g., effect size estimation, significance assessment), and to visualize outputs fromthe algorithm. We first detail the model and its parametrisation, and describe in detailsits optimised implementation. Based on two examples we finally illustrate its statisticalperformances and flexibility.
Bibliographical noteVersion archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.
- Bayesian variable selection
- OMICs data
- graphics processing unit
- multivariate regression