A correlation-based feature weighting filter for naive Bayes

Liangxiao Jiang, Lungan Zhang, Chaoqun Li, Jia Wu

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Due to its simplicity, efficiency, and efficacy, naive Bayes (NB) has continued to be one of the top 10 algorithms in the data mining and machine learning community. Of numerous approaches to alleviating its conditional independence assumption, feature weighting has placed more emphasis on highly predictive features than those that are less predictive. In this paper, we argue that for NB highly predictive features should be highly correlated with the class (maximum mutual relevance), yet uncorrelated with other features (minimum mutual redundancy). Based on this premise, we propose a correlation-based feature weighting (CFW) filter for NB. In CFW, the weight for a feature is a sigmoid transformation of the difference between the feature-class correlation (mutual relevance) and the average feature-feature intercorrelation (average mutual redundancy). Experimental results show that NB with CFW significantly outperforms NB and all the other existing state-of-the-art feature weighting filters used to compare. Compared to feature weighting wrappers for improving NB, the main advantages of CFW are its low computational complexity (no search involved) and the fact that it maintains the simplicity of the final model. Besides, we apply CFW to text classification and have achieved remarkable improvements.

LanguageEnglish
Article number8359364
Pages201-213
Number of pages13
JournalIEEE Transactions on Knowledge and Data Engineering
Volume31
Issue number2
DOIs
Publication statusPublished - 1 Feb 2019

Fingerprint

Redundancy
Data mining
Learning systems
Computational complexity

Cite this

Jiang, Liangxiao ; Zhang, Lungan ; Li, Chaoqun ; Wu, Jia. / A correlation-based feature weighting filter for naive Bayes. In: IEEE Transactions on Knowledge and Data Engineering. 2019 ; Vol. 31, No. 2. pp. 201-213.
@article{c4465f2215894cc292a3601caa9b4775,
title = "A correlation-based feature weighting filter for naive Bayes",
abstract = "Due to its simplicity, efficiency, and efficacy, naive Bayes (NB) has continued to be one of the top 10 algorithms in the data mining and machine learning community. Of numerous approaches to alleviating its conditional independence assumption, feature weighting has placed more emphasis on highly predictive features than those that are less predictive. In this paper, we argue that for NB highly predictive features should be highly correlated with the class (maximum mutual relevance), yet uncorrelated with other features (minimum mutual redundancy). Based on this premise, we propose a correlation-based feature weighting (CFW) filter for NB. In CFW, the weight for a feature is a sigmoid transformation of the difference between the feature-class correlation (mutual relevance) and the average feature-feature intercorrelation (average mutual redundancy). Experimental results show that NB with CFW significantly outperforms NB and all the other existing state-of-the-art feature weighting filters used to compare. Compared to feature weighting wrappers for improving NB, the main advantages of CFW are its low computational complexity (no search involved) and the fact that it maintains the simplicity of the final model. Besides, we apply CFW to text classification and have achieved remarkable improvements.",
keywords = "Correlation, correlation, Decision trees, Electronic mail, Feature extraction, feature weighting, Mathematical model, mutual information, mutual redundancy, mutual relevance, naive Bayes, Redundancy, Training",
author = "Liangxiao Jiang and Lungan Zhang and Chaoqun Li and Jia Wu",
year = "2019",
month = "2",
day = "1",
doi = "10.1109/TKDE.2018.2836440",
language = "English",
volume = "31",
pages = "201--213",
journal = "IEEE Transactions on Knowledge and Data Engineering",
issn = "1041-4347",
publisher = "Institute of Electrical and Electronics Engineers (IEEE)",
number = "2",

}

A correlation-based feature weighting filter for naive Bayes. / Jiang, Liangxiao; Zhang, Lungan; Li, Chaoqun; Wu, Jia.

In: IEEE Transactions on Knowledge and Data Engineering, Vol. 31, No. 2, 8359364, 01.02.2019, p. 201-213.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - A correlation-based feature weighting filter for naive Bayes

AU - Jiang,Liangxiao

AU - Zhang,Lungan

AU - Li,Chaoqun

AU - Wu,Jia

PY - 2019/2/1

Y1 - 2019/2/1

N2 - Due to its simplicity, efficiency, and efficacy, naive Bayes (NB) has continued to be one of the top 10 algorithms in the data mining and machine learning community. Of numerous approaches to alleviating its conditional independence assumption, feature weighting has placed more emphasis on highly predictive features than those that are less predictive. In this paper, we argue that for NB highly predictive features should be highly correlated with the class (maximum mutual relevance), yet uncorrelated with other features (minimum mutual redundancy). Based on this premise, we propose a correlation-based feature weighting (CFW) filter for NB. In CFW, the weight for a feature is a sigmoid transformation of the difference between the feature-class correlation (mutual relevance) and the average feature-feature intercorrelation (average mutual redundancy). Experimental results show that NB with CFW significantly outperforms NB and all the other existing state-of-the-art feature weighting filters used to compare. Compared to feature weighting wrappers for improving NB, the main advantages of CFW are its low computational complexity (no search involved) and the fact that it maintains the simplicity of the final model. Besides, we apply CFW to text classification and have achieved remarkable improvements.

AB - Due to its simplicity, efficiency, and efficacy, naive Bayes (NB) has continued to be one of the top 10 algorithms in the data mining and machine learning community. Of numerous approaches to alleviating its conditional independence assumption, feature weighting has placed more emphasis on highly predictive features than those that are less predictive. In this paper, we argue that for NB highly predictive features should be highly correlated with the class (maximum mutual relevance), yet uncorrelated with other features (minimum mutual redundancy). Based on this premise, we propose a correlation-based feature weighting (CFW) filter for NB. In CFW, the weight for a feature is a sigmoid transformation of the difference between the feature-class correlation (mutual relevance) and the average feature-feature intercorrelation (average mutual redundancy). Experimental results show that NB with CFW significantly outperforms NB and all the other existing state-of-the-art feature weighting filters used to compare. Compared to feature weighting wrappers for improving NB, the main advantages of CFW are its low computational complexity (no search involved) and the fact that it maintains the simplicity of the final model. Besides, we apply CFW to text classification and have achieved remarkable improvements.

KW - Correlation

KW - correlation

KW - Decision trees

KW - Electronic mail

KW - Feature extraction

KW - feature weighting

KW - Mathematical model

KW - mutual information

KW - mutual redundancy

KW - mutual relevance

KW - naive Bayes

KW - Redundancy

KW - Training

UR - http://www.scopus.com/inward/record.url?scp=85047014442&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2018.2836440

DO - 10.1109/TKDE.2018.2836440

M3 - Article

VL - 31

SP - 201

EP - 213

JO - IEEE Transactions on Knowledge and Data Engineering

T2 - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

SN - 1041-4347

IS - 2

M1 - 8359364

ER -