PLS for big data: a unified parallel algorithm for regularised group PLS

Pierre Lafaye de Micheaux, Benoît Liquet, Matthew Sutton

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)
21 Downloads (Pure)


Partial Least Squares (PLS) methods have been heavily exploited to analyse the association between two blocks of data. These powerful approaches can be applied to data sets where the number of variables is greater than the number of observations and in the presence of high collinearity between variables. Different sparse versions of PLS have been developed to integrate multiple data sets while simultaneously selecting the contributing variables. Sparse modeling is a key factor in obtaining better estimators and identifying associations between multiple data sets. The cornerstone of the sparse PLS methods is the link between the singular value decomposition (SVD) of a matrix (constructed from deflated versions of the original data) and least squares minimization in linear regression. We review four popular PLS methods for two blocks of data. A unified algorithm is proposed to perform all four types of PLS including their regularised versions. We present various approaches to decrease the computation time and show how the whole procedure can be scalable to big data sets. The bigsgPLS R package implements our unified algorithm and is available at
Original languageEnglish
Pages (from-to)119-149
Number of pages31
JournalStatistics Surveys
Publication statusPublished - 2019
Externally publishedYes

Bibliographical note

Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.


  • High dimensional data
  • Lasso penalties
  • Partial Least Squares
  • Singular Value Decomposition


Dive into the research topics of 'PLS for big data: a unified parallel algorithm for regularised group PLS'. Together they form a unique fingerprint.

Cite this