Malytics: a malware detection scheme

Research output: Contribution to journalArticleResearchpeer-review

Abstract

An important problem of cyber-security is malware analysis. Besides good precision and recognition rate, ideally, a malware detection scheme needs to be able to generalize well for novel malware families (a.k.a zero-day attacks). It is important that the system does not require excessive computation particularly for deployment on the mobile devices. In this paper, we propose a novel scheme to detect malware which we call Malytics. It is not dependent on any particular tool or operating system. It extracts static features of any given binary file to distinguish malware from benign. Malytics consists of three stages: feature extraction, similarity measurement and classification. The three phases are implemented by a neural network with two hidden layers and an output layer. We show feature extraction, which is performed by tf-simhashing, is equivalent to the first layer of a particular neural network. We evaluate Malytics performance on both Android and Windows platforms. Malytics outperforms a wide range of learning-based techniques and also individual state-of-the-art models on both platforms. We also show Malytics is resilient and robust in addressing zero-day malware samples. The F1-score of Malytics is 97:21% and 99:45% on Android dex file and Windows PE files respectively, in the applied datasets. The speed and efficiency of Malytics are also evaluated.

LanguageEnglish
Pages49418-49431
Number of pages14
JournalIEEE Access
Volume6
DOIs
Publication statusPublished - 28 Sep 2018

Fingerprint

Feature extraction
Neural networks
Mobile devices
Malware

Keywords

  • Androids
  • Binary Level n-grams
  • Extreme Learning Machine
  • Feature extraction
  • Humanoid robots
  • Malware
  • Malware Detection
  • Microsoft Windows
  • Neural networks
  • Static Analysis
  • Task analysis
  • Term Frequency Shimhashing

Cite this

@article{99ca780e82bf449494c0e32732d206f8,
title = "Malytics: a malware detection scheme",
abstract = "An important problem of cyber-security is malware analysis. Besides good precision and recognition rate, ideally, a malware detection scheme needs to be able to generalize well for novel malware families (a.k.a zero-day attacks). It is important that the system does not require excessive computation particularly for deployment on the mobile devices. In this paper, we propose a novel scheme to detect malware which we call Malytics. It is not dependent on any particular tool or operating system. It extracts static features of any given binary file to distinguish malware from benign. Malytics consists of three stages: feature extraction, similarity measurement and classification. The three phases are implemented by a neural network with two hidden layers and an output layer. We show feature extraction, which is performed by tf-simhashing, is equivalent to the first layer of a particular neural network. We evaluate Malytics performance on both Android and Windows platforms. Malytics outperforms a wide range of learning-based techniques and also individual state-of-the-art models on both platforms. We also show Malytics is resilient and robust in addressing zero-day malware samples. The F1-score of Malytics is 97:21{\%} and 99:45{\%} on Android dex file and Windows PE files respectively, in the applied datasets. The speed and efficiency of Malytics are also evaluated.",
keywords = "Androids, Binary Level n-grams, Extreme Learning Machine, Feature extraction, Humanoid robots, Malware, Malware Detection, Microsoft Windows, Neural networks, Static Analysis, Task analysis, Term Frequency Shimhashing",
author = "Mahmood Yousefi-Azar and Hamey, {Leonard G. C.} and Vijay Varadharajan and Shiping Chen",
year = "2018",
month = "9",
day = "28",
doi = "10.1109/ACCESS.2018.2864871",
language = "English",
volume = "6",
pages = "49418--49431",
journal = "IEEE Access",
issn = "2169-3536",
publisher = "Institute of Electrical and Electronics Engineers (IEEE)",

}

Malytics : a malware detection scheme. / Yousefi-Azar, Mahmood; Hamey, Leonard G. C.; Varadharajan, Vijay; Chen, Shiping.

In: IEEE Access, Vol. 6, 28.09.2018, p. 49418-49431.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Malytics

T2 - IEEE Access

AU - Yousefi-Azar, Mahmood

AU - Hamey, Leonard G. C.

AU - Varadharajan, Vijay

AU - Chen, Shiping

PY - 2018/9/28

Y1 - 2018/9/28

N2 - An important problem of cyber-security is malware analysis. Besides good precision and recognition rate, ideally, a malware detection scheme needs to be able to generalize well for novel malware families (a.k.a zero-day attacks). It is important that the system does not require excessive computation particularly for deployment on the mobile devices. In this paper, we propose a novel scheme to detect malware which we call Malytics. It is not dependent on any particular tool or operating system. It extracts static features of any given binary file to distinguish malware from benign. Malytics consists of three stages: feature extraction, similarity measurement and classification. The three phases are implemented by a neural network with two hidden layers and an output layer. We show feature extraction, which is performed by tf-simhashing, is equivalent to the first layer of a particular neural network. We evaluate Malytics performance on both Android and Windows platforms. Malytics outperforms a wide range of learning-based techniques and also individual state-of-the-art models on both platforms. We also show Malytics is resilient and robust in addressing zero-day malware samples. The F1-score of Malytics is 97:21% and 99:45% on Android dex file and Windows PE files respectively, in the applied datasets. The speed and efficiency of Malytics are also evaluated.

AB - An important problem of cyber-security is malware analysis. Besides good precision and recognition rate, ideally, a malware detection scheme needs to be able to generalize well for novel malware families (a.k.a zero-day attacks). It is important that the system does not require excessive computation particularly for deployment on the mobile devices. In this paper, we propose a novel scheme to detect malware which we call Malytics. It is not dependent on any particular tool or operating system. It extracts static features of any given binary file to distinguish malware from benign. Malytics consists of three stages: feature extraction, similarity measurement and classification. The three phases are implemented by a neural network with two hidden layers and an output layer. We show feature extraction, which is performed by tf-simhashing, is equivalent to the first layer of a particular neural network. We evaluate Malytics performance on both Android and Windows platforms. Malytics outperforms a wide range of learning-based techniques and also individual state-of-the-art models on both platforms. We also show Malytics is resilient and robust in addressing zero-day malware samples. The F1-score of Malytics is 97:21% and 99:45% on Android dex file and Windows PE files respectively, in the applied datasets. The speed and efficiency of Malytics are also evaluated.

KW - Androids

KW - Binary Level n-grams

KW - Extreme Learning Machine

KW - Feature extraction

KW - Humanoid robots

KW - Malware

KW - Malware Detection

KW - Microsoft Windows

KW - Neural networks

KW - Static Analysis

KW - Task analysis

KW - Term Frequency Shimhashing

UR - http://www.scopus.com/inward/record.url?scp=85053341631&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2018.2864871

DO - 10.1109/ACCESS.2018.2864871

M3 - Article

VL - 6

SP - 49418

EP - 49431

JO - IEEE Access

JF - IEEE Access

SN - 2169-3536

ER -