At Elemendar we have to keep up to date with as much cyber threat intelligence and machine learning research as we possibly can, so we are constantly reading and exploring.

Ben Strickson (our senior data engineer) regards these four papers released this year as especially interesting as their work addresses some of the biggest current challenges in the CTI & machine learning field and thinks these authors deserve a special mention.

1. Oosthoek, Kris, and Christian Doerr.Cyber threat intelligence: A product without a process?
These researchers have issued a challenge to all commercial CTI players in the field of cyber security, asking them to significantly improve the quality of their data. By this we avoid throwing more technology at problems that often require human solutions. They argue that CTI feeds are currently: underused, unreliable, not transparent in their sourcing, and biased in their focus. We are inclined to agree up to a point, as these are some of the issues we too have come across as we build our own analyst platform, READ. Most of us within the field would likely agree with the authors’ call for better analyst frameworks, and more intelligence-based tradecraft training, making cyber security safer and more effective for us all.

2. Schlette, Daniel, et al.Measuring and visualizing cyber threat intelligence quality.” 

These researchers recognised the challenge of CTI data quality and have proposed a solution which augments rather than replaces the analyst. Instead of ignoring the poor quality of data or using automated tools to impute missing information, this team defined an objective set of metrics for reporting CTI quality. What we really liked about this was that they included a front end visualisation tool, and then performed robust user testing to validate their data quality hypothesis.

3. Preuveneers, Davy, and Wouter Joosen.Sharing Machine Learning Models as Indicators of Compromise for Cyber Threat Intelligence.

This is a genuinely novel piece of work, the authors here have understood that the sharing of IoCs is oftentimes a redundant activity. Instead they propose the sharing of trained ML models for the detection of known TTPs and campaigns; with the implication being that these models could attribute a novel indicator to a known campaign. What was really encouraging about this research was the thorough consideration of industry adoption with a robust and secure framework proposed for model sharing.

4. Müller, Robert, and Elmar Padilla.From Plain Text to CTI–A Technological Solution for Gathering Cyber Threat Intelligence using Natural Language Processing.”

Improving the automatic extraction of entities and relationships from CTI documents is a daily fundamental challenge for the field. There have been numerous studies published often involving the latest neural networks from the wider field of natural language processing. We recommend this NLP focused paper for two reasons, firstly it takes a whole systems approach by including a user interface and robust backend architecture. Secondly, it is open and specific about the need for rules-based engineering to deliver an analyst ready product.

Robert Müller and Elmar Padilla