A less-greedy two-term Tsallis Entropy Information Metric approach for decision tree classification

Yisen Wang, Shu-Tao Xia, Jia Wu*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

28 Citations (Scopus)

Abstract

The construction of efficient and effective decision trees remains a key topic in machine learning because of their simplicity and flexibility. A lot of heuristic algorithms have been proposed to construct near-optimal decision trees. Most of them, however, are greedy algorithms that have the drawback of obtaining only local optimums. Besides, conventional split criteria they used, e.g. Shannon entropy, Gain Ratio and Gini index, are based on one-term that lack adaptability to different datasets. To address the above issues, we propose a less-greedy two-term Tsallis Entropy Information Metric (TEIM) algorithm with a new split criterion and a new construction method of decision trees. Firstly, the new split criterion is based on two-term Tsallis conditional entropy, which is better than conventional one-term split criteria. Secondly, the new tree construction is based on a two-stage approach that reduces the greediness and avoids local optimum to a certain extent. The TEIM algorithm takes advantages of the generalization ability of two-term Tsallis entropy and the low greediness property of two-stage approach. Experimental results on UCI datasets indicate that, compared with the state-of-the-art decision trees algorithms, the TEIM algorithm yields statistically significantly better decision trees and is more robust to noise.

Original languageEnglish
Pages (from-to)34-42
Number of pages9
JournalKnowledge-Based Systems
Volume120
DOIs
Publication statusPublished - 15 Mar 2017
Externally publishedYes

Keywords

  • Attribute split criterion
  • Classification
  • Decision trees
  • Tree construction

Fingerprint

Dive into the research topics of 'A less-greedy two-term Tsallis Entropy Information Metric approach for decision tree classification'. Together they form a unique fingerprint.

Cite this