Multimodal Deep Learning for Android Malware Classification

Loading...
Thumbnail Image

Date

2025-02-28

DOI

Open Access Location

Journal Title

Journal ISSN

Volume Title

Publisher

MDPI (Basel, Switzerland)

Rights

(c) The author/s
CC BY

Abstract

This study investigates the integration of diverse data modalities within deep learning ensembles for Android malware classification. Android applications can be represented as binary images and function call graphs, each offering complementary perspectives on the executable. We synthesise these modalities by combining predictions from convolutional and graph neural networks with a multilayer perceptron. Empirical results demonstrate that multimodal models outperform their unimodal counterparts while remaining highly efficient. For instance, integrating a plain CNN with 83.1% accuracy and a GCN with 80.6% accuracy boosts overall accuracy to 88.3%. DenseNet-GIN achieves 90.6% accuracy, with no further improvement obtained by expanding this ensemble to four models. Based on our findings, we advocate for the flexible development of modalities to capture distinct aspects of applications and for the design of algorithms that effectively integrate this information.

Description

Keywords

multimodal deep learning for Android malware detection, enhanced malware analysis, graph neural networks, function call graphs (FCG), efficient multimodal late fusion, CNN GNN Ensemble, bytecode image analysis, Android APK analysis, data fusion

Citation

Arrowsmith J, Susnjak T, Jang-Jaccard J. (2025). Multimodal Deep Learning for Android Malware Classification. Machine Learning and Knowledge Extraction. 7. 1.

Collections

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license

Except where otherwised noted, this item's license is described as (c) The author/s