Abstract
Android malware detection remains a critical challenge in cybersecurity research. Recent advancements leverage AI techniques, particularly deep neural networks (DNNs), to train a detection model, but their effectiveness is often compromised by the pronounced imbalance among malware families in commonly used training datasets. This imbalance leads to overfitting in dominant categories and poor performance in underrepresented ones, increasing predictive uncertainty for less common malware families. To address the suboptimal performance of many DNN models, we introduce MalTutor, a novel framework that enhances model robustness through an optimized training process. Our primary insight lies in transforming uncertainties from “liabilities” into “assets” by strategically incorporating them into DNN training methodologies. Specifically, we begin by evaluating the predictive uncertainty of DNN models throughout various training epochs, which guides our sample categorization. Incorporating Curriculum Learning strategies, we commence training with easy-to-learn samples with lower uncertainty, progressively incorporating difficult-to-learn ones with higher uncertainty. Our experimental results demonstrate that MalTutor significantly improves the performance of models trained on imbalanced datasets, increasing accuracy by 31.0%, elevating the F1 score by 138.8%, and specifically boosting the average accuracy in detecting various types of malicious apps by 133.9%. Our findings provide valuable insights into the potential benefits of incorporating uncertainty to improve the robustness of DNN models for prediction-oriented software engineering tasks.
| Original language | English |
|---|---|
| Article number | ISSTA015 |
| Pages (from-to) | 1-23 |
| Number of pages | 23 |
| Journal | Proceedings of the ACM on Software Engineering |
| Volume | 2 |
| Issue number | ISSTA |
| DOIs | |
| Publication status | Published - Jul 2025 |
| Externally published | Yes |
Bibliographical note
Copyright the Author(s) 2025. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.Keywords
- Android malware detection
- Uncertainty
- Curriculum Learning