Understanding the probabilistic perspective adopted by the Exein Machine Learning Engine when distinguishing valid behaviour from cyber-attacks.
This is the second part of a blog series about the Exein Machine Learning Engine (MLE). Read the first part here.
In this blog post we will go through the part of the Exein MLE that is responsible for testing the ability of a trained security model to detect real and simulated anomalies and avoid false positives before being deployed in production. We refer to this part of Exein as the MLE Scorer.
A Probabilistic Approach to IoT Security
From the beginning, we designed the Exein MLE to follow a probabilistic approach to the problem of IoT security, rather than a deterministic one. In fact, at Exein we believe there is a lot of value in answering the harder question “what is the chance that a given IoT device is under attack?” rather than the simpler “is the device under attack or not?”
To be more accurate, Exein actually evaluates at each time step the probability that the latest actions performed by the device firmware have been produced by the same generative process that accounted for the ones observed during the training phase, which we assume have been produced by the true “normal” generative process.
Such a probabilistic approach allows us to build an incredibly flexible engine that is much more powerful than traditional, binary rule-based solutions. In order to understand how exactly we developed the MLE around this probabilistic interpretation of security, we will go through the details of the two main conceptual building blocks of the MLE Scorer: threshold setting and model validation.
As explained in the previous post, the way the MLE understands the behaviour of a device is essentially by making predictions about future actions and measuring the discrepancy between the predicted actions and the observed ones. If the observed discrepancy rises above a certain level (i.e. the threshold) then the MLE promptly informs the other security components of Exein about the anomaly so that they can take the appropriate actions and block the threat.
However, no ML model is perfect, and some mistakes are just unavoidable due to the unpredictable nature of the data being modelled. For this reason, rather than setting a threshold separating normal and anomalous behaviour based on individual point “anomalies”, the MLE Scorer smoothes the observed deviation from the learned behaviour (as measured by the cross entropy) across larger subsequent observations or windows, focusing in this way its decision making on behavioural shifts that are robust and sustained through time, rather than on individual occasional anomaly bursts.
To accomplish this goal, the MLE Scorer first computes the single cross entropies observed by the MLE on all the available training data and then builds up a distribution of smoothed cross entropies by randomly sampling different windows of cross entropies for a very large number of times and averaging them.
In this way, the MLE Scorer is able to simulate in just a fraction of a second all the possible ways in which the individual actions performed by the device can intermix and interact with each other in real life.
By probabilistically interpreting the generated smoothed anomaly distribution, the MLE Scorer finally sets the security threshold for the MLE at a 5σ confidence level, the same used at CERN to determine the discovery of a new particle. In words, this corresponds to a probability of 1 in 3.5 millions that an anomaly signalled by the MLE is a false positive.
Setting a threshold at the right confidence level is only half the job of the MLE Scorer. In fact, before being put in production, the trained MLE model and the generated threshold need to be tested thoroughly in order to ensure that the model is able to correctly and effectively identify both normal behaviour and anomalies.
As a first test, the trained security model is interpreted against real data from the device that is deliberately omitted during the training phase. In this way the Scorer can be sure that the MLE does not generate any false positives when monitoring real allowed operations by the device. If any false positive is encountered during this first testing phase, the model is immediately rejected and a new one is instantiated and trained.
After having ensured that the model is effective at identifying unseen normal behaviour correctly, the MLE is scored against a set of synthetic anomalies injected one by one inside the real, actually observed data produced by the device. This test is performed over a very large number of (random) possible combinations of anomalous hooks, which allows the MLE Scorer to have a probabilistic interpretation of the model effectiveness.
In fact, by injecting a single anomalous hook at the end of many randomly-sampled valid windows of hooks and interpreting the MLE about the likelihood of the modified sequences, the Scorer can answer the question: “what is the chance that the MLE will spot an attack after a single hook?”. The (probabilistic) answer to this question — and to the related ones for each additional anomalous hook inserted into the valid data — forms the basis of the final output of the MLE Scorer: a numerical score condensing the effectiveness of the trained security model at detecting attacks in the least possible time.
Head over to exein.io for more information about Exein Core, including how to get started at securing your own firmware.
GitHub : https://github.com/Exein-io/exein
Posted by Giovanni Alberto Falcione, Head of Machine Learning Exein