It's no secret that much of the wisdom of the world lies in unstructured data, or the kind that's not necessarily quantifiable and tidy. So it is in cybersecurity, and now IBM is putting Watson to work to make that knowledge more accessible.
Towards that end, IBM Security on Tuesday announced a new year-long research project through which it will collaborate with eight universities to help train its Watson artificial-intelligence system to tackle cybercrime.
Knowledge about threats is often hidden in unstructured sources such as blogs, research reports and documentation, said Kevin Skapinetz, director of strategy for IBM Security.
"Let's say tomorrow there's an article about a new type of malware, then a bunch of follow-up blogs," Skapinetz explained. "Essentially what we're doing is training Watson not just to understand that those documents exist but to add context and make connections between them."
Over the past year, IBM Security's own experts have been working to teach Watson the "language of cybersecurity," he said. That's been accomplished largely by feeding it thousands of documents annotated to help the system understand what a threat is, what it does, and what indicators are related, for example.
"You go through the process of annotating documents not just for nouns and verbs, but also what it all means together," Skapinetz said. "Then Watson can start making associations."
Now IBM aims to accelerate the training process. This fall, it will begin working with students at universities including California State Polytechnic University at Pomona, Penn State, MIT, New York University and the University of Maryland at Baltimore County along with Canada's universities of New Brunswick, Ottawa and Waterloo.
Over the course of a year, it aims to feed up to 15,000 new documents into Watson every month, including threat intelligence reports, cybercrime strategies, threat databases and materials from its own X-Force research library. X-Force represents 20 years of security research, including details on 8 million spam and phishing attacks and more than 100,000 documented vulnerabilities.
Watson's natural language processing capabilities will help it make sense of those reams of unstructured data. Its data-mining techniques will help detect outliers, and its graphical presentation tools will help find connections among related data points in different documents, IBM said.
Ultimately, the result will be a cloud service called Watson for Cyber Security that's designed to provide insights into emerging threats as well as recommendations on how to stop them.
Some 60,000 security blogs are published each month, and that's just one of many sources of information cybersecurity professionals must try to keep up with, Skapinetz noted.
"You can see why even the best analysts are missing a lot of the information out there," he said. "What we're aiming to do is take away some of the guesswork and help analysts understand more context with an always-on advisor that can help investigate and answer questions."
IBM plans to begin beta production deployments later this year.