We propose two multivariate extensions of the Bayesian group lasso for variable selection and estimation for data with high dimensional predictors and multi-dimensional response variables. The methods utilize spike and slab priors to yield solutions which are sparse at either a group level or both a group and individual feature level. The incorporation of group structure in a predictor matrix is a key factor in obtaining better estimators and identifying associations between multiple responses and predictors. The approach is suited to many biological studies where the response is multivariate and each predictor is embedded in some biological grouping structure such as gene pathways. Our Bayesian models are connected with penalized regression, and we prove both oracle and asymptotic distribution properties under an orthogonal design. We derive efficient Gibbs sampling algorithms for our models and provide the implementation in a comprehensive R package called MBSGS available on the Comprehensive R Archive Network (CRAN). The performance of the proposed approaches is compared to state-of-the-art variable selection strategies on simulated data sets. The proposed methodology is illustrated on a genetic dataset in order to identify markers grouping across chromosomes that explain the joint variability of gene expression in multiple tissues.
Bibliographical noteCopyright the Publisher 2017. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.
- Bayesian variable selection
- multivariate regression
- spike and slab