extracted the -gram binary character features from malicious samples and achieved 98% accuracy. In the malware binary code, there are some binary OpCode sequences that are more significant as compared to benign programs which could be used as feature points for machine learning. Currently, the -gram OpCode features have been commonly used by machine learning-based detection models. The -gram OpCode features have much less computational overhead compared to dynamic features, such as API call sequences. Moreover, the -gram OpCode features could cover much more code area than dynamic features which are limited by the virtual machine execution environment.Īt present, the robustness of malware detection models is getting more and more attention. The adversarial machine learning techniques are widely used to test the robustness of machine learning models in the fields of image recognition and speech recognition, also in the computer security field such as spam filtering. Adversarial machine learning could effectively find out a malicious input data perturbation to attack or cause a malfunction to the target machine learning models. However, traditional adversarial perturbation methods could not be applied on binary OpCodes directly. The binary OpCode features are sustainable that the binary OpCode modification is much difficult with program execution and semantic preserving. In this paper, we propose BMOP, a bidirectional universal adversarial learning method for effective binary OpCode perturbation from both benign and malicious perspectives. The benign features are those OpCodes that significantly represent benign behaviours, while malicious features are OpCodes dominate malicious behaviours. From a large dataset of benign and malicious binary applications, we select the most important benign and malicious OpCode features based on the feature SHAP values calculated from the trained machine learning models.
0 Comments
Leave a Reply. |