With Hillary Clinton and Donald Trump set to face off in the first debate of a contentious presidential election tonight, the MIT Media Lab is set to make sense of the firehose of chatter expected to hit Twitter.
The Laboratory for Social Machines, part of the MIT Media Lab, launched the machine-learning project, dubbed Electome, about a year ago to help give people more of a voice about the election.
Then representatives from the Commission on Presidential Debates reached out to Electome's project leaders, asking for help in giving the media a better look at what people were saying on Twitter about the candidates, the issues and all the ups and downs of the debate.
The goal was to get the public’s voices into the conversation about the debates and the overall election.
“We were trying to help the election go beyond polls and really get into the conversation,” said Bill Powers, a research scientist at the MIT Media Lab. “The data revolution we’re living through, in terms of American politics, the way it’s been used is by political parties to win. We felt it was time for this same data to track ideas.”
While Powers sees the project as a public service, for this year, anyway, their tweet analysis, and complementary visualization of it, are available on a dashboard that is only open to the media.
He explained that the Electome group wanted to start with a relatively small number of users and then expand from there.
Working with news outlets like the Washington Post, CNN, and Bloomberg, the Electome project has about 200 individual journalists signed up to use it. Members of the research team will be on site for the first debate tonight at Hofstra University in Hempstead, N.Y., and they expect to sign up a lot more reporters there.
Twitter has given the MIT team access to its entire firehose of tweets – about 500 million tweets on a normal day.
Today likely will not be a normal day.
Twitter users, who will be able to watch the debate live streamed on the site, are likely to tweet about their favorite, and least favorite, candidate, the vocal blows made and blows thwarted, as well as the issues raised. They’ll also retweet comments and memes, along with stats, talking points and images that the two campaigns and the Democratic and Republican parties will post.
That’s a lot of tweets and data to analyze on the fly, all while viewing it from the backdrop of all the data that has been analyzed over the past year.
Without machine learning technology, it wouldn’t be possible.
“Since we started this, we’ve been getting the whole firehose and saving it,” Powers said. “The amount of human oversight that would be required [to do this without machine learning] would be massive… The analysis would be much more basic. We’d be capturing much less of the conversation without the machine learning.”
Deb Roy, director of the Laboratory for Social Machines and chief media scientist at Twitter, said that if they had a massive team of researchers, a pile of money to fund them and unlimited time, maybe they could do the analysis without machine learning.
Those are not the parameters they’re working with, however.
“We’re processing a very large volume of tweets and news stories, and we’re trying to classify and organize tweets around the major issues in the election and understand patterns across all the tweets and not just some of them,” Roy said. “It’s a moving target. If you want to say which ones are about immigration, which ones are about terrorism, you need help.”
A few of the issues that makes this kind of analysis so difficult, aside from the massive amount of tweets that need to be immediately analyzed, is that people use well-known and sometimes their own particular short-hands on Twitter because of the 140-character tweet limit. They also use a constantly changing list of hash tags.
While someone might tweet about pneumonia or racism, both of which have been issues in this election, it doesn’t mean the tweet is related to the election.
The Electome algorithms need to decipher which tweets are about the election and then put those tweets into the proper category bucket.
They also need to keep up with the quickly changing cast of issues connected to the election -- and tonight’s debate.
For instance, in one minute pneumonia was not an election issue and the next minute it was. The machine-learning algorithms need to immediately recognize that kind of change.
To keep up with the changes, the algorithms are taking in 500 to 600 news stories a day and learning the new names and vocabulary associated with different topics.
“The machine needs to know a tweet about pneumonia is about the election and it’s a subtopic,” explained Roy. “If we simply said, ‘Go find all tweets about pneumonia, there are a significant amount of conversations and news stories that are not about the election. If you pick up a tweet every time Brussels is mentioned, you can’t assume it’s about terrorism. So you can’t just look for key words, but you need to train a model that looks for tweets about pneumonia that are in the election context.”
At this point, the project is only analyzing tweets in English from North America.
The top issues on Twitter since MIT started this project? Foreign policy and national security.
“The opinions on Twitter are almost as diverse as the number of tweets,” Powers said. “There’s a lot about Putin and Russia. You hear a lot about Benghazi. You hear positive things about Hillary Clinton’s work abroad. It’s the gamut of things happening in the world.”