In addition to signature-based and heuristics-based detection techniques,
machine learning (ML) is widely used to generalize to new, never-before-seen
malicious software (malware). However, it has been demonstrated that ML models
can be fooled by tricking the classifier into returning the incorrect label.
These studies, for instance, usually rely on a prediction score that is fragile
to gradient-based attacks. In the context of a more realistic situation where
an attacker has very little information about the outputs of a malware
detection engine, modest evasion rates are achieved. In this paper, we propose
a method using reinforcement learning with DQN and REINFORCE algorithms to
challenge two state-of-the-art ML-based detection engines (MalConv & EMBER)
and a commercial AV classified by Gartner as a leader AV. Our method combines
several actions, modifying a Windows portable execution (PE) file without
breaking its functionalities. Our method also identifies which actions perform
better and compiles a detailed vulnerability report to help mitigate the
evasion. We demonstrate that REINFORCE achieves very good evasion rates even on
a commercial AV with limited available information.