Microarray dataset dimensionality reduction is a prerequisite for avoiding overfitting, and hence developing diagnostic tools. Some previous work has selected features based, e.g., on their individual Fisher discriminants (F-values), or path-based training algorithms optimising the power of the resulting classi_er. We show that a generic method, using a simple stepwise regression with the linear support vector machine penalised margin width as the objective function, subject to regularization parameter grid-search, gives superior performance to three other feature-selection methods (least-angle regression, Random Forest, and stepwise regression on Fisher discriminants). We use a hierarchical validation method, applying leave-one-out cross-validation within the training subset, and applying the trained classi_er to a separate test subset, on each of four two-class gene expression cancer datasets. The generic method shows superior results when classifying unseen samples, compared to three other feature selection methods, and a fixed regularisation value appears nearly optimal for all four datasets.
|Title of host publication||2009 JSM proceedings|
|Subtitle of host publication||papers presented at the Joint Statistical Meetings, Washington, DC, August 1-6, 2009, and other ASA-sponsored conferences; Statistics: from evidence to policy|
|Place of Publication||Alexandria, VA|
|Publisher||American Statistical Association|
|Number of pages||15|
|Publication status||Published - 2009|
|Event||Joint Statistical Meetings : Statistics : from evidence to policy - Washington, DC|
Duration: 1 Aug 2009 → 6 Aug 2009
|Conference||Joint Statistical Meetings : Statistics : from evidence to policy|
|Period||1/08/09 → 6/08/09|
- feature selection
- support vector machines
- path- based algorithms
Peters, T., Bulger, D. W., Loi, T-H., Yang, J. Y. H., & Ma, D. (2009). Cancer microarray feature selection using support vector machines: comparing regularization techniques. In 2009 JSM proceedings: papers presented at the Joint Statistical Meetings, Washington, DC, August 1-6, 2009, and other ASA-sponsored conferences; Statistics: from evidence to policy (pp. 2951-2965). Alexandria, VA: American Statistical Association.