COMBSS: Best Subset Selection via Continuous Optimization

Sarat Moka*, Benoit Liquet, Houying Zhu, Samuel Muller

*Corresponding author for this work

Research output: Working paperPreprint

Abstract

We consider the problem of best subset selection in linear regression, where the goal is to find for every model size $k$, that subset of $k$ features that best fits the response. This is particularly challenging when the total available number of features is very large compared to the number of data samples. We propose COMBSS, a novel continuous optimization based method that directly solves the best subset selection problem in linear regression. COMBSS turns out to be very fast, potentially making best subset selection possible when the number of features is well in excess of thousands. Simulation results are presented to highlight the performance of COMBSS in comparison to existing popular non-exhaustive methods such as Forward Stepwise and the Lasso, as well as for exhaustive methods such as Mixed-Integer Optimization. Because of the outstanding overall performance, framing the best subset selection challenge as a continuous optimization problem opens new research directions for feature extraction for a large variety of regression models.
Original languageEnglish
PublisherarXiv.org
DOIs
Publication statusSubmitted - 5 May 2022

Publication series

NamearXiv

Keywords

  • Best Subset Selection
  • Model Selection
  • Linear Regression
  • Continuous Optimization

Fingerprint

Dive into the research topics of 'COMBSS: Best Subset Selection via Continuous Optimization'. Together they form a unique fingerprint.

Cite this