Tech + EngineeringTech & Engineering

Smartest Machine: Expert Q&A

On February 28, 2011, David Ferrucci answered questions about the IBM supercomputer "Watson," how it performed during the Jeopardy! competition in mid-February 2011, its potential future applications, and more.

Thursday, January 20, 2011 Nova

Wagering algorithm

Q: I'm curious about the algorithm Watson uses for how much to wager in Double & Final Jeopardy. We know it's strategic in some way, but it seems confusing. Can you explain it? Susan Mayhew, Tampa

Watson betting algorithms are based on its confidence calculations, human betting behavior, and the current state of the game. If it is well ahead, for example, Watson will not take a lot of risk and will bet small. In the 1st Final Jeopardy of the exhibition games, contrary to what Alex said about US CITIES being an easy category, Watson was not as sure, since it learned that categories do NOT indicate the difficulty of a question. Also, Watson was ahead at that point, and there was a whole other game to play, so it bet conservatively to try to preserve its lead. Once ahead and with little of the game left, there is no point on betting big on Daily Doubles. In the 2nd Final Jeopardy, Watson bet to beat Ken by one dollar, if he were to double up. At that point Watson had clinched it with the final DD.

Support Provided By

Learn More

Watson's future

Q: I've waited 2 yrs to see Watson on Jeopardy; I'm sad it's over. What happens to Watson now? Will you keep the system running to continue this research and will other QA researchers have access to conduct experiments? Will we ever see Watson again? Micki, Chicago, IL

While probably not on Jeopardy!, you can bet on seeing Watson or one of its descendants again.

Speech recognition

Q: Speech recognition has made great strides in recent years. Why was this functionality not incorporated in Watson? Stephen Davies, Toronto (not a U.S. city)

We considered using speech recognition in the beginning and have trained it on Alex's voice. But the key challenge for us was the open-domain question-answering, and including speech-recognition challenges would have muddied the experiment. Potential errors would have come from multiple dimensions, and it would have been much more difficult to focus and explain the outcome of the technology demonstration. I for one hope to tie it all together (speech, language, and vision) in our future research plans.

"How" and "why" questions

Q: Would it be possible in the future for Watson to provide an elaborate response to a 'Why' or 'How' question? For instance, 'Why is the sky blue?' can be answered correctly and objectively in a few sentences. Michael Howard, Rolla, Missouri

Yes. Absolutely. This too is one of our important research directions. The expectation is that if there is a good explanation out there Watson can discover, score, and even chain levels of explanation together. However, inferring how and why answers that require deeper thinking may represent a level of intelligence that requires capturing knowledge that is much more difficult to automatically learn. With humans encoding these kinds of rules, systems can behave very intelligently, but they become difficult to scale (too much specialized human effort required), and they are brittle (miss the mark very easily) and narrow (only work for the anticipated data). Obviously big challenges remain for deep AI. I view Watson's underlying architecture as an important step forward in building systems that can more easily capture and exploit knowledge already encoded in natural language.

Abstraction

Q: I have the impression that the Watson system is still more artificial than it could be if more generic algorithms on a more abstract level could be implemented; do you have an opinion about that? (anyway, *superb* work so far!) Stephan van Ingen, Belgium

I am painfully aware of all of Watson's failings and simultaneously excited about all the ways it may be extended and made to behave more intelligently. We have stacks of ideas on how to generalize its intelligence, and we are anxious now, in the wake of the Jeopardy! challenge, to get back to work! :)

Ability to induce

Q: To what extent is Watson able to induce? Given a large number of facts, is it able to construct human-reviewable generalized hypotheses? Can it induce "all men are mortal"? Larry O'Brien, Kailua Kona, HI

It does exploit basic rules of deduction, allowing it to infer properties over taxonomic structures. And by reading large amounts of text and analyzing examples it can statistically induce generalizations like "people earn degrees at schools" or "inventors patent inventions." Watson also has some very specialized knowledge that would enable it to infer more complex relations not easily learned by "reading." But these frames are still manually encoded collections of concepts and rules. Building these manually is well-understood; the real challenge is learning them automatically from language and ensuring they may be more flexibly applied. We are developing techniques to try to do exactly that.

Daily Doubles

Q: I read somewhere that the IBM scientists were "befuddled" with the unusual amounts Watson chose for the Daily Doubles. That's very surprising - I would have expected that the amount wagered would be the most straightforward algorithms in the system. Mike Voytovich, Millbrae, CA

Smile! The team here is not befuddled by the betting algorithms. They are well-understood. Of course, there is an occasional bug in the code that may lead to an unintended bet, and when that occurred the team had to discover the problem and fix it. The betting algorithms were trained with different objective functions related to risk management. So at one point during the sparring games, for example, Watson bet huge amounts on Daily Doubles because it discovered that over many, many games this strategy optimized the chances of winning. I was unhappy with the potential downside in any one game and had the strategy team make adjustments. So often, as an outsider looking at the team debate these strategies during the sparring games, you might think we were "befuddled" - but more often than not, we were experimenting, exploring, and debating. Admittedly, of course, once in a while we were briefly confused by an unintended bug - but this was relatively rare.

Advice for interested students

Q: What would you tell a high school senior who wishes to work on similar projects? I mean, besides the obvious "go to college & study CS". What, in your background, would you do differently? What would you earnestly recommend for others to emulate? Michael B, Rochester, NY

Experiment for yourself—read about the history of AI and write code mimicking what others have done over the years. Build and experiment with your own systems. Be very critical of what has been done in the past—understand its weaknesses and limitations. Explore hybrid solutions—do not get locked into one approach or one style, rather focus on system-level results. Learn to combine semantics with statistics—knowledge and math. Rise above the program—do not become a Java programmer or a C programmer or a LISP programmer or a Prolog programmer—understand the theory and then become a system builder—whatever it takes to build intelligent systems.

Lessons from Jeopardy!

Q: There are obviously some bloopers on the part of Watson. What are some of the biggest and most important lessons learned from Watson's performance on Jeopardy!? Ewen Chan, Ridgetown, Ontario, Canada

Tackling the broad-domain versus narrowly defined problems is hard and yet the best way to push technology. Relative to prior attempts, Jeopardy! is broad-domain because of the range of language and topics. To move forward, intelligent language systems must get even more aggressive in the breadth of tasks they tackle.

Publications on Watson

Q: Where can I find all of IBM's publications coming out of Watson's experiments? Sudarshan Rangarajan, Menlo Park, CA

Here is an earlier, very high-level paper: https://researcher.ibm.com/researcher/view_page.php?id=2107 By 3Q 2011, we should publish about 10-15 detailed papers on this work all together in an IBM Systems Journal.

Scaling potential

Q: 2800 processors obviously kick an answer to one question really fast. How will this technology scale down so I can hold it in my hand (or mobile device), and scale up so millions of humans can use it simultaneously? When will it hit the mainstream? Erik Peterson, Chicago

For starters, we can imagine a Watson-like technology running on a Cloud (interconnected large cluster of servers) that services many simultaneously mobile devices each running a lightweight client. A second point is that the 2800-core system was highly optimized for competing in real-time on Jeopardy!. As we look at different application scenarios, I predict we will find cost-effective solutions—solutions that deliver real return on investment.

Scaling potential (cont.)

Q: I want to know if Watson can be scaled EASILY, i.e., can there be half a Watson or a double Watson? In other words, can the memory, processing cores, storage capacity, etc. be EASILY added / removed to suit different use-case scenarios. Thanks. Vaidyanathan L S, Bangalore, India

Actually, remarkably flexible scaling is possible. We were very pleased with how Watson's underlying architecture and implication could be deployed on a single core, on 500 cores, and then on several thousand cores and with different configurations of memory and disk. Latency was then affected by how much data and how many algorithms we deployed to drive up breadth, confidence, and precision. The deployment configurations explored to optimize different scenarios were performed by a team familiar with scale-out and UIMA-AS (uima.apache.org) rather than with the Watson-specific code. This is a good thing.

If Watson had a twin...

Q: If Watson had an identical twin and both were playing a Jeopardy! game, would they necessarily come up with the same responses? Mark Stradling, Trenton, NJ

Smile. If the code and data versions were exactly identical at both training time and run-time, yes.

The buzzer issue

Q: it's obvious both Ken & Brad are trying to buzz in immediately, using the alleged edge that they have in *anticipatory* reaction time, but Watson is still reacting faster, most of the time. how thoroughly was this specific aspect of gameplay tested? kilow, PA

Once Watson got fast enough, on average, to compute an accurate answer and confidence before it was time to buzz, then it could leverage its consistently fast buzzer speed. While a good human anticipatory buzzer could beat its, Watson could dominate the buzzer in a game. This only helps, of course, if its "smart" enough to know if and when it had a correct answer. Once Watson had competitive levels of precision and confidence (could evaluate evidence well enough to determine an accurate likelihood that it had the right answer), it became a goal to make Watson fast enough to be a better buzzer. After all, that is how, once we are good enough at question-answering, you win at Jeopardy!—it's how Ken and Brad beat their best competitors. Watch Ken's winning streak again :) —it's remarkable how he dominated the buzzer in so many games. To win at Jeopardy! you have to have it all—breadth, precision, confidence, speed, and strategy.

Tough questions

Q: Cool Circuits, Watson! What do you get if you heat ice to 53 degrees? a puddle, fire, pain, or weather? Which of the following does NOT involve the mouth - talking, kissing, walking, eating? Which is youngest: a teenager, retiree, or infant? Celeste, Eugene, OR

Not sure what Watson would say, but I do not understand the question :)

Why "Toronto"?

Q: Can you explain how Watson came up with "Toronto" as an answer in the category "U.S. Cities"? Russell LaMantia, Chicago

Watson had very low confidence in both its top answers. Had it been a regular Jeopardy! question, Watson would have avoided answering altogether. Since it was a Final Jeopardy question, it showed its top answer, Toronto, in spite of the low confidence but printed lots of "?"s after it. Its 2nd answer was Chicago (the correct answer).

Toronto—14%
Chicago—11%

Watson learned by training over many Jeopardy! questions that Jeopardy! categories do NOT strongly indicate the answer type. Consider these examples:

U.S. CITIES: St. Petersburg is home to Florida's annual tournament in this game popular on shipdecks (Shuffleboard)
U.S. CITIES: Rochester, New York grew because of its location on this (the Erie Canal)
U.S. CITIES: Seattle's "gem" of a nickname, or Dorothy's destination (The Emerald City)
OLD CITIES: Carchemish was a Hittite one of these hyphenated sovereign areas, like Sparta (a city-state)
PENNSYLVANIA CITIES: Scranton's iron industry used nearby deposits of this hard coal to fuel its blast furnaces (anthracite)
PENNSYLVANIA CITIES: Zion's Church in Allentown houses a replica of this patriotic item temporarily hidden there in 1777 (Liberty Bell)
WORLD CITIES: Recife is called "The Venice of" this South American country (Brazil)
CITIES: For a short time in the 1950s, La Plata, Argentina was named for her (Eva Peron) So being a "U.S. City" was not a #1 priority, and Watson found more WWII evidence based on the Lester Pearson airport in Toronto. If you rephrase the question so that it reads "This U.S. City....", Watson gets Chicago with higher confidence and in 1st position because it weighs the "U.S. City" evidence higher.

All things considered, Watson failed to gather enough WWII hero and battle evidence to boost Chicago over Toronto or to get either answer above its confidence threshold.

Watson's heritage

Q: Seeing Watson as a research activity, how much of your architecture and algorithms were original research? What known algorithms and techniques have you utilized from the computer science and machine learning literature? Feyyaz, Turkey

Watson is built on decades of research (successes and failures) in Natural Language Processing, Statistical Machine Learning, Information Retrieval, and Knowledge Representation and Reasoning. We started building on our own explorations in space and quickly learned that to conquer Jeopardy! we had to find a way to extend, generalize, advance, and integrate many state-of-the-art techniques. Many of our own prior results and other results described in the literature failed to deliver the breadth, confidence estimation, and precision needed to compete at high enough levels. The key to our success was the DeepQA architecture and related methodologies that allowed us to rapidly innovate, integrate, and experiment with many different techniques and algorithms.

The Turing test

Q: With your exprience in AI, what develompents will still be needed towards mastering the Turing test? John Dirry, Austria

We have a lot more work to do in natural language processing. I personally believe that approaches like Watson that focus on integrating many different techniques that can boot-strap off of linguistic and statistical NLP leveraging formally encoded knowledge only when necessary will allow us to scale Watson's capability. Active learning frameworks will be key.

Machine learning

Q: The most interesting part of the program was how "machine learning" improved Watson's performance. Do you envision a time five or six years from now when 20 of these machines talk to each other and improve each other's performance exponentially? Russell LaMantia, Chicago, IL USA

I do envision a time sooner than that where Watson will generate questions for humans (and possibly other instances of itself working in different domains) to greatly accelerate its learning process. The goal of course is not to learn factoids but rather to use facts to learn how to more accurately interpret language. It will generate questions that help its ability to understand.

Caring for computers

Q: I am def on Watson team. But do you think by creating something man made to have the ability to learn your creating more problems? I mean in all honesty people are hard enough to teach and keep happy ...must we also be responsible for computers??? Tiffany Tubby, NY, NY

The way I see it, computers are tools. If we can get them to operate more effectively on our terms, then the better they can help us digest and vet huge volumes of content—ultimately making us more effectively learn from ourselves.