|As we grow we better understand both our market and customer needs and how we relate to them. Thanks to this process we’ve evolved an updated presentation deck, messages and story which better describe why we are here and why we are valuable to you.
1. We are all drowning in CTI and there will never be enough people to utilise it all.
2. By building an AI (robot) analyst which can read and translate human written CTI into machine readable data for us we are solving this immediate issue and laying the groundwork to automate CTI as its volume and relevance across organisations and systems continue to increase.
3. Using Elemendar’s AI can save costs and keep you safer from harm and liabilities. Beginning to use automation creates an efficiency feedback loop which will give those who do so a significant competitive advantage over those who don’t.
We are continuing to work hard improving our technology and will shortly support the Mitre ATT&CK framework. https://attack.mitre.org/. We’re especially excited to see Mitre’s own TRAM project announced while Giorgos was in Washington, DC, as we expect to feed into and combine with our implementation.
Reading and translating CTI automatically with software is not trivial. Syra (CTO and co-founder) shares a perspective on the kind of issues we are solving below.
PDFs Are Annoying – by Syra Marshall
At Elemendar we spend most of our time working on cutting edge challenges in machine learning and natural language processing. But we also face challenges within areas that are significantly older and more widely pervasive than even cybersecurity, let alone CTI. I am talking, of course, about accurately and usefully extracting data from PDFs.
One might think that because PDFs tend to look well structured to the human eye, with titles, tables, headers, footers, everything one might want in a document written by people for other people to read, that converting the underlying data to something a machine can usefully use would be straight-forward, maybe at least as easy as understanding how everything fits together in HTML with CSS. But this is very much not the case; PDFs are defined purely by layout, much more similarly to how print makers of old would layout the typesetting for Victorian newspapers, than how a web-based paper has its layout defined by the stylesheets that are applied on top of the underlying structure (as CSS does for HTML).
What this means for Elemendar, given that so much of existing CTI is delivered as PDFs, is that we must be able to perform layout analysis to ensure that we can handle edge cases such as hashes that wrap over multiple lines in a table; or make sure we don’t misidentify the author’s email as a threat indicator in a footer of a page, simply because plain text extraction leaves the email address embedded in a paragraph that talks about email addresses that have been involved in a phishing campaign.
Without giving away our secret sauce, we found that combining several different open source layout analysis tools (such as tabula, Apache Tika, and myriad plain text extraction tools) allows Elemendar to handle tables, headers and footers separately. It’s laborious but extremely necessary, both so that our training data is as accurate as possible and also so that the data our models analyses to extract STIX objects for our customers are as complete and accurate as possible.
Things we’ve read on AI and Cyber which you might like
Good news! More girls applying for cyber course this year!
Disinformation in social media is all the rage. Now we can model it as CTI.
On the run? Make sure you wear a T-shirt that can fool AI cameras
That’s all for this month!