Wrapping up a three-day run on the Jeopardy game show, IBM's Watson computer has beaten two former champions in a historic match of man versus machine.
The run has successfully demonstrated not only that a computer can beat humans in a trivia question quiz, but, more importantly, it shows how computers can answer questions much like people do, opening up a potentially new form of human/computer interaction.
In the final episode of the pre-recorded two-game, three-night match, Watson had trounced the competition, amassing US$77,147 in winnings over the two Jeopardy champions it played, Brad Rutter and Ken Jennings. Rutter scored $21,600 and Jennings scored $24,000. Watson also took the $1 million champion prize, which IBM will donate to charity.
Run by Sony Pictures Television, Jeopardy is a long-running U.S. TV game show in which three contestants compete to answer trivia questions, arranged into multiple categories and ordered by increasing difficulty. Contestants are given an average of about 5 seconds to answer a question.
IBM researchers spent four years building Watson. The machine is capable of processing 80 trillion operations (teraflops) per second. It runs about 2,800 processor cores and has 16 terabytes of working memory.
Building such a system to play on Jeopardy proved to be an immense project, one far more challenging even than building a chess-playing supercomputer, which IBM did in the late 1990s.
"It's a much different kind of problem. Chess was very challenging for the time due the mathematics. This was a very different type of program," said Watson lead manager David Ferrucci, in an IBM viewing party held in New York for Wednesday's show. "It's not finite problem or a well-defined space. You are dealing with ambiguity, and the contextual nature of language."
On the software side, the machine uses the Apache Hadoop distributed file system and the Apache UIMA (Unstructured Information Management Architecture), a framework for analyzing unstructured data. Perhaps the most useful software, however, is a natural language processing program called DeepQA that IBM claims can understand a human sentence. This program is what differs Watson from a typical search engine, which can just return a list of results to a set of keywords.
The questions were entered into Watson by text; it did not use voice-recognition technology. For these rounds, Jeopardy eschewed questions that involved audio or video snippets. Watson did, however, answer questions in a smooth synthesized voice.
To build a body of knowledge for Watson, the researchers amassed 200 million pages of content, both structured and unstructured, across 4 terabytes of disks. It searches for matches and then uses about 6 million logic rules to determine the best answers. When given a question, the software initially analyzes it, identifying any names, dates, geographic locations or other entities. It also examines the phrase structure and the grammar of the question for hints of what the question is asking.
On the first night of the Jeopardy match, held Monday, both man and machine seemed on equal footing, with Watson tied with Rutter for $5,000 and Jennings following with $2,000. By Tuesday, however, Watson started to show its muscle: Watson led the evening with $35,734, Rutter followed with $10,400 and Jennings trailed with $4,800.
On Wednesday, the machine scored well above human competitors, thanks not only to its immense body of knowledge but also due to the algorithms the researchers put in place to make the best bets. For the Daily Double, a special hidden question where the contestant is allowed the wager any amount of his or her holdings, Watson bet a seemingly arbitrary $2,127, a number that the audience found amusing.
Such computerized wages "are seemingly random to us mere mortals," Ferrucci said. "But what is actually going on is that it is considering its confidence in the category. It's also considering where it is in the game, how far ahead or behind it is ahead it is, how much money can still be potentially won or lost. All that adds up to a fairly complex calculation. You get numbers that are optimized down to that precision."
While Watson performed flawlessly in many cases, it was also capable of flubs even casual Jeopardy watchers could laugh at. On Tuesday's show, when asked for the largest U.S. airport named after a World War II hero, it responded with Toronto, the name of a Canadian city. On Wednesday's show it missed a question asking for the name of a well-known reference book, "The Elements of Style." To this question, Watson had inscrutably and confidently answered "Dorothy Parker."
While IBM has no plans for a rematch or a Watson Version 2, it does plan to market the Watson technology in various fields such as health care, where, drawing from a specific body of knowledge, it could answer tough questions.
"I think Watson has the potential to transform the way people interact with computers," said Jennifer Chu-Carroll, an IBM researcher working on the project, said. "Watson is a significant step, allowing people to interact with a computer as they would a human being. Watson doesn't give you a list of documents to go through but gives the user an answer."