A comparison between decision tree and random forest in determining the risk factors associated with type 2 diabetes

Habibollah Esmaily, Maryam Tayefi, Hassan Doosti, Majid Ghayour-Mobarhan, Hossein Nezami, Alireza Amirabadizadeh*

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    68 Citations (Scopus)

    Abstract

    Background: We aimed to identify the associated risk factors of type 2 diabetes mellitus (T2DM) using data mining approach, decision tree and random forest techniques using the Mashhad Stroke and Heart Atherosclerotic Disorders (MASHAD) Study program.

    Study design: A cross-sectional study.

    Methods: The MASHAD study started in 2010 and will continue until 2020. Two data mining tools, namely decision trees, and random forests, are used for predicting T2DM when some other characteristics are observed on 9528 subjects recruited from MASHAD database. This paper makes a comparison between these two models in terms of accuracy, sensitivity, specificity and the area under ROC curve.

    Results: The prevalence rate of T2DM was 14% among these subjects. The decision tree model has 64.9% accuracy, 64.5% sensitivity, 66.8% specificity, and area under the ROC curve measuring 68.6%, while the random forest model has 71.1% accuracy, 71.3% sensitivity, 69.9% specificity, and area under the ROC curve measuring 77.3% respectively.

    Conclusions: The random forest model, when used with demographic, clinical, and anthropometric and biochemical measurements, can provide a simple tool to identify associated risk factors for type 2 diabetes. Such identification can substantially use for managing the health policy to reduce the number of subjects with T2DM.

    Original languageEnglish
    Article numbere00412
    Pages (from-to)1-7
    Number of pages7
    JournalJournal of Research in Health Sciences
    Volume18
    Issue number2
    Publication statusPublished - 2018

    Keywords

    • Data mining
    • Decision tree
    • Diabetes mellitus
    • Iran
    • Random forest

    Fingerprint

    Dive into the research topics of 'A comparison between decision tree and random forest in determining the risk factors associated with type 2 diabetes'. Together they form a unique fingerprint.

    Cite this