Here at Elemendar we’ve been exploring the application of machine learning and AI tools since 2017. We’ve been looking at how to use different techniques from machine learning and finding ways that they can actually help analysts in the real world. In doing this we’ve learnt about the analysis areas that can most benefit from the application of these tools and curated particularly interesting data sets to help us test, train and apply these tools to real world problems.
Applying Data Science to Cyber Threat Intelligence
In discussions with our research partners in industry and government, we considered particular analysis activities that could most benefit from ML tooling – i.e. how could ML be applied to help human analysts better understand things, or do their jobs. We then looked a little deeper into each of the analysis activities to consider the input and output domains for each of these areas and thought about how trivial or hard these inputs were to manage, organise or control.
This led to us summarising these analysis domains in a good old fashioned quadrant as follows, with the following examples of analysis activities in each area:
- Sentiment Analysis of social media
- Translation of data into a particular language (e.g. English language translation)
- Improved situational analysis for CTI
- Prediction of real world events (e.g horizon scanning/foresight).
Working through these areas led us to building our focus around CTI, which represents an area with a diverse range of data sources that human analysts need to bring together, understand and make judgements and recommendations upon. These are worthwhile, hard problems – either in the complexity of what they are trying to achieve or the ongoing management of the tasks required. As a result, we’ve decided to help such analysis by enabling the humans responsible for it to produce outputs in a simple, structured form by leveraging our understanding of AI technology to automate the more laborious, time consuming tasks of their analysis. Our READ. application is designed to do this and has been built with the ongoing feedback and testing from real life analysis teams in organisations and government departments who are constantly working hard to make sense of the cyberthreat landscape and prepare for the challenges this presents.
Curating Datasets to help improve understanding and prediction within CTI
As well as building READ, we have been working on our own capacity to better understand and curate data in the CTI space. CTI is such a rich and important area of intelligence that it provides an interesting and varied source of material to apply data science methods – potentially a huge task. To make our contribution matter the most, we ask in what areas can we make a difference, where can we help analysts, what predictions can we usefully make that help us more quickly find a threat?
All of these processes require a good amount of planning, testing and technical implementation, as well as building the foundations around good datasets and a common understanding of the CTI as an area of study. On top of these foundations, we can then continue to research and develop new tools and techniques that actually help human analysts do their jobs better. With ever more data constantly being generated and ever more cyber threats to face, we feel this is imperative.
So, if you have any thoughts or questions about our analysis above or the work we are doing, please do get in touch.