A two-step algorithm to estimate variable importance for multi-state data: an application to COVID-19

Behnaz Alafchi, Leili Tapak*, Hassan Doosti, Christophe Chesneau, Ghodratollah Roshanaei

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)
193 Downloads (Pure)

Abstract

Survival data with a multi-state structure are frequently observed in follow-up studies. An analytic approach based on a multi-state model (MSM) should be used in longitudinal health studies in which a patient experiences a sequence of clinical progression events. One main objective in the MSM framework is variable selection, where attempts are made to identify the risk factors associated with the transition hazard rates or probabilities of disease progression. The usual variable selection methods, including stepwise and penalized methods, do not provide information about the importance of variables. In this context, we present a two-step algorithm to evaluate the importance of variables for multi-state data. Three different machine learning approaches (random forest, gradient boosting, and neural network) as the most widely used methods are considered to estimate the variable importance in order to identify the factors affecting disease progression and rank these factors according to their importance. The performance of our proposed methods is validated by simulation and applied to the COVID-19 data set. The results revealed that the proposed two-stage method has promising performance for estimating variable importance.
Original languageEnglish
Pages (from-to)2047-2064
Number of pages18
JournalComputer Modeling in Engineering & Sciences
Volume135
Issue number3
Early online date18 Aug 2022
DOIs
Publication statusPublished - 2023

Bibliographical note

Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.

Keywords

  • Multi-state data
  • deviance residual
  • martingale residual
  • gradient boosting
  • random forest
  • neural network
  • variable importance
  • variable selection

Fingerprint

Dive into the research topics of 'A two-step algorithm to estimate variable importance for multi-state data: an application to COVID-19'. Together they form a unique fingerprint.

Cite this