The MITRE ATT&CK framework has, quite rightly become one of the cornerstones of modern cyber threat intelligence. For those not aware of the framework, the core of the system is the ATT&CK Navigator, shown below:

Figure 1: MITRE ATT&CK framework (X and Y axis)

The X Axis in Figure 1 highlights the 14 Tactics within the framework and the Y Axis highlights the many Techniques that are listed in the framework, all individually enumerated with “T Codes”. Using the MITRE ATT&CK framework is simple – see for yourself? Look up T1037?

Simple enough, right? – it’s “Boot or Logon Initialization Scripts” and can be found under the Persistence Tactic. Now try this one – which T Code does this description correspond to?

The tweet also contains a hashtag with information to allow HAMMERTOSS to extract encrypted instructions from an image file. The hashtag indicates that the hidden data is offset 101 bytes into the image file and the characters to be used for decryption are docto.

In this case it’s T1022 – Data Encrypted and T1140 – Deobfuscate/Decode Files or Information.

The point is that going from right-to-left on the MITRE ATT&CK framework is simple; however, going from left-to-right, i.e. matching unstructured narratives to structured T Codes, is far more challenging.  

At this juncture an obvious solution to this gap – mapping unstructured data to MITRE categories – is to apply an Artificial Intelligence approach. 

But before that some basics…  

Data Scraping VS Artificial Intelligence 

As a challenge try extracting the Indicators of Compromise (IOCs) from the text below:

‘Operation Ghost has been recently active, with malware (6ACC0B1230303F8CF46152697D3036D69EA5A849) variants seen beaconing to various command and control locations (192.0. 2.1).’

Easy enough? In this case the IOCs were simple to spot, these being the malware hash value (6ACC0…) and the IP address (192.0. 2.1). Now, how successful do you think you would be extracting 100% of the IOCs in a 50-page document, or 50 50-page documents, or even 5000 50-page documents? The point here is obvious: human performance drops rapidly even on simple tasks over a large enough scale, and in these cases it makes sense to get a machine to do the heavy lifting. IOC extraction like the example shown above is trivial for a machine even across thousands of document types. 

But the function of the machine mentioned above is not artificial intelligence but just basic computing, formally termed “data scraping” by the developer community. In essence the approach is basic and goes something like this:

If 

text string = alpha numeric & length = 40 then text string = malware hash

Else if 

text string = Number + Number + Number + Character + Number + Character+ Number + Character + Number then text string = IP address

In a nutshell what separates an artificial intelligence program such as Elemendar’s READ. from a basic data scraping program, is its ability to apply a subjective judgement onto input data. Returning to the original example, repeated below:

‘The tweet also contains a hashtag with information to allow HAMMERTOSS to extract encrypted instructions from an image file. The hashtag indicates that the hidden data is offset 101 bytes into the image file and the characters to be used for decryption are docto.’

Key traits that match text to T Codes are extract encrypted instructions from an image file (T1140 – Deobfuscate/Decode Files) and hashtag indicates that the hidden data is offset 101 bytes into the image file (T1140 – Data Encrypted).

What is notable about the above is that there is no obvious one-to-one matching of the text to the T Code, and this is what separates an artificial intelligence program from a “dumb” data scraper. We don’t need to rerun the previous exercise to conclude that few human analysts could successfully identify and extract multiple TTPS like the example above from a 50 page analysis, let alone 500,000 documents, with the accuracy that a machine could. It is in solving problems such as this that Artificial Intelligence systems start to come into their own.

To conclude 

Artificial Intelligence is often somewhat of a black box, within which unscrupulous vendors can hide a lack of capability. However, beneath the smoke and mirrors there is an easy explanation of what benefits Artificial Intelligence can bring if we look at it from the user lens. 

Elemendar is the world leader in developing AI (Machine Learning) to process human-authored cyber threat intelligence into machine-readable, actionable data, to enable cyber analysts to better protect their organisations against cyber threats.

Stewart Bertram is Elemendar’s Head of CTI with more than 15 years experience in Intelligence and Cyber Threat Intelligence in both public sector and private roles.