The second round of the Jeopardy! showdown pitting humanity against IBM's Watson supercomputer did not go so well for the carbon-based lifeforms, Ken Jennings and Brad Rutter. If the IBM machine was capable of diabolical laughter, it would have let out a rather lengthy cackle. This was the middle round of the three-part Jeopardy …
You cannot stop Judgement Day!
You can only postpone it!
*Climbs into bunker*
Wanted to watch this ...
... but left the US last week.
Can anyone correct me on the following? The machine will win due to its reflexes, not due to its ability to answer questions (although that ability is indeed impressive). The human players can't get a look in as it takes their biomechanical systems longer to trigger the buzzer.
Look at the buzzerless round: the machine failed.
1 - the machine has a buzzer.
2 - you can have the fastest buzzer in the world - if you're wrong you're wrong and lose points.
3 - if you look at the behind the scenes footage from the trials a few week ago you'll find it's all about Watson learning essentially how to think. It has a number of algorithms to try to determine the correct answer and has to learn which of these are reliable in different situations.
1 - Yes. I referenced this in my post. What is your point? That your reading comp is kinda poor?
2 - If you have the fastest buzzer in the world, and are as smart but no smarter than your opponents, your victory margin will not mean anything very much. And all I was really asking was whether this had been thought about. Whether this really was anything more than a pointless TV spectacle.
3 - I have not seen the footage (that reading comp again?), but I think you'll find it's all about Watson being programmed to answer jeopardy questions better, and not about Watson learning to think. Tell you what. Put him on University Challenge with no algorithm tweaks ...
1 - you cant argue its all about the buzzer if the machine has a buzzer, which you dont reference in the first post (you say that the human constestants take longer to trigger the buzzer because of their bio mechanical components - this is different to saying the machine has a buzzer (which it has to trigger mechanically).
2 - Again, you can't argue its all about the buzzer because it lost in the buzzerless round if its giving the correct answer when the buzzer is involved. If I built a machine that could trigger the buzzer at the speed of light, then it cannot win jeopardy unless its got coding like watson to determine the answer. It would lose hard.
3- You can't comment on something you haven't seen, so your comment here is rather pointless. As these videos explain Watson has a large number of algorithms that may or may not result in the correct answer. They Feed watson a large number of real jepoardy questions and answers to train it to determine which of these algorithms are "reliable" in what situations (the programmers do not determine for the machine which are reliable - Watson learns this from experience).
He'd probably have an easier time on University challenge as thats a straight Question/Answer programme, rather than the reveresed tricky-wording format of Jeopardy.
1. I wrote "[...] it takes their biomechanical systems longer to trigger the buzzer." The hanging comparative implies something. Namely that the biomechanical trigger is slower than something else. Perhaps an electromechanical buzzer? You can parse the words, but you're missing a lot of the meaning.
2 - Again, you've missed the point. Watson will win hard at jeopardy and can win hard at jeopardy even if it "knows" less than its opponents. Because of the disproportionate effect of the buzzer. It has to "know" a fair amount less than its opponents to actually lose at this game. Or be blissfully unaware of the limits of its knowledge (and that's just not how AI works, its all about the limits). Let me make this easy for you. Imagine two human contestants. All you know about them is that one has some condition that has slowed his reaction times by 1 second on average. Who is your money on? Yes, it's impressive that Watson can parse a subset of English sentences so well. But embedding it in jeopardy this way, is just spectacle, and of little actual interest. Because there is no actual contest. This is a long, long way from Kasparov vs Deep Blue.
3 - I have never seen electrons and I have never seen god, but I comment regularly on both. As, I suspect, do you. Please don't wield that arugment in public ever again. And do you know what. I said I hadn't seen it, and asked for some sort of comment precisely because I hadn't seen it. Seriously. What the fuck is wrong with you?
As for your last point. You admit that Watson J would use a different set of algorithms from Watson UC ("to determine which of these algorithms are 'reliable' in what situations"), and therefore should probably admit that "learning to think" was hopelessly naive at best.
IBM's buzzer-buzzing robot ...
I thoroughly enjoyed the event, but I question the fairness of placing an electronic system against humans when the critical step is timing a buzzer. I'm fairly certain a highschooler could design a robot capable of beating the world best trivia-experts at mashing a button first.
Ken/Brad could have easily answered most of those questions instead of Watson, but for some reason are unable to match his 1ms buzzer response times.
Not detracting from the parallel compute power - welcome our new, redundantly animated overlord, and such - but I was very distracted by this part. And I'm also curious if Watson knows the exact answer when it buzzes, or if it buzzes and then finishes refining the answer.
"And I'm also curious if Watson knows the exact answer when it buzzes, or if it buzzes and then finishes refining the answer."
That's a bit too close to human for some.
I for one - etc. etc. and is Watson free for the local pub quiz next week?
Fair? Why ever should it be fair?
I think the AI gets the text of the question at the point Trebek finishes speaking, while the fleshies have been aware of the keywords for several hundred msec by reading it off the panel. So you could say it sort of balances out. Remember there's a cost for buzzing with the wrong answer.
But ultimately, fairness isn't the point. It's not fair that Watson has hugely indexed access to a squillion words of carefully chosen references and encyclopedias. It's not fair that it doesn't have an emotional response to getting an answer wrong. It's not fair in the same way that its not fair that the SAP general ledger system doesn't have to contend with playful co-workers hiding its abacus -- it's still a better book-keeper.
The point is that it's playing a game that anyone would have said was grossly unreasonable as a challenge for a machine, and holding its own while doing so. Apparently, machines are better than humans at answering general knowledge questions -- and they can get that way just by loading the text of (well chosen) books. That's new, and vastly important.
I for one...
How did you get there?
When I saw the machine give its answer to the final Jeopardy question, I wasn't surprised at the wrongness of the answer, but I became curious as to how it came to that wrong answer.
The clue that the human host (Trebeck) gave was to name
(a) a city, with
(b) two airports, where
(c) one airport was named for a World War II general, and
(d) the second airport was named for a World War II battle.
Watson's answer of Toronto satisfied
(a) as it was the name of a city, but partly failed on
(b) as Toronto proper doesn't have two major airports, failed on
(c) as Toronto's main airport (Lester B. Pearson International Airport) is named after one of Canada's most popular Prime Ministers who never achieved the rank of general, and failed on
(d) because while there are tiny airports scattered around Metro Toronto, none of them are named for World War II battles.
When Watson proposed Toronto as its answer, had it checked each of these constraints in turn, it would have eliminated Toronto for its failure to satisfy three of the four constraints.
Now for Final Jeopardy, we didn't get to see the other answers that Watson was considering but didn't propose (for all the regular Jeopardy questions, we were shown on screen Watson's prime choice, and Watson's two alternate choices). So it's possible that Watson had Chicago in its list of possible answers, but for some reason, its internal ranking put Toronto higher than Chicago in its answer list.
I await tomorrow's (Wed, Feb 16th, 2011) last Jeopardy contest starring Watson with some interest. But I hold more interest for the possibility that the IBM team will answer some of these questions for us after the show.
Lieutenant Commander Edward Henry "Butch" O'Hare
So not a general either...
Was its answer Toronto Ohio or Toronto Ontario?
Would be interesting to see a breakdown of its logic, whether it understood the question correctly and why it arrived at that answer, I suspect it misinterpreted the question.
The category was "US Cities".
The article said hero not general
If the article is correct in its description of the last question your analysis of the question is flawed.
Maybe Toronto's airport is named after someone whose name also happens to be that of someone who was a WWII hero. It seems to me that a problem with gargantuan amounts of but not quite complete data is that false correlations will occur. Still it did seem to know it was unlikely to be correct.
Toronto is going to defect?
Perhaps Watson and thenextwatson know something the rest of us don't?
>Watson's answer of Toronto satisfied
>(a) as it was the name of a city, but partly failed on
The Final Jeopardy topic was _U.S._ Cities.
Top 10 ways to tell you're in Canada:
10. $1 coins that people actually use, and $2 coins too.
9. Parliament, not Congress
8. It's colder than it is in 40 U.S. states.
7. Many Canucks end sentences with 'eh'
6. Everything is in French and English
5. Mailboxes (post boxes) are red.
4. Signs on the 401 that say "Bridge to the USA"
3. Gas (petrol) is sold in litres.
2. The Beer store.
1. The Queen of England is on all the money.
Other than that, yeah, Canada is just like the U.S.
Queen of England? Shurely shome mishtake...
"the United Kingdom, Canada, Australia, New Zealand, Jamaica, Barbados, the Bahamas, Grenada, Papua New Guinea, the Solomon Islands, Tuvalu, Saint Lucia, Saint Vincent and the Grenadines, Belize, Antigua and Barbuda, and Saint Kitts and Nevis" ref. Wikipedia.
The United Kingdom of what?
I rest my case.
Computers are probably better at sarcasm too....
Nice reply there - trying to prove you are more intelligent and funny than Watson... Few problems ->
1. Toronto Ohio -- it has an airport, bit of google magic and its called Eddie Dew Memorial Airpark...
2. Who was Eddie Dew? There is an actor with that name not a war hero afaik but who knows the computer may have correlated memorial and his name with some WWII films he was in!
3. Its was only 30 odd percent sure the answer was right - it was guessing basically
Toronto Ohio. While you were on google you'd have seen it's a "city" (it has a mayor) of about 5000. One airport, not two, and you're not funny either, even as an anonymous cowherd.
Great Britain and Northern Ireland
As titular appendage
@Blake St Claire
It says "The United Kingdom of Great Britain and Northern Ireland" on my passport.
And if you'd have googled you'd have seen there are three airports within 10 mile radius of the 'City' of Toronto OH, The others being Herron and Jefferson County. Charles D Herron. was a five star General in WWII (http://en.wikipedia.org/wiki/Charles_D._Herron). but I am yet to make a battle connection.. not that it matters much as we all know what the right answer should have been...
Anyway I always thought it was odd that the Americans have such small cities the lack of size distinction does cause issues such as this.
PS dont forget where you put your case... A swift retrieval may be due...
Even I know Toronto is a Canadian city, Paris because Watson is as coy,
How does it get the questions? Are they typed in as they are read/heard? Are they put in beforehand? If the latter then it surely has an unfair advantage.
Need more info - link would be good
Curiosity regarding the technology discussed here:
What were the questions asked?
How does the computer interpret the questions?
How does it find the answers?
Is it connected to the Internet?
Some of the answers
1. The questions (strictly, for Jeopardy!, answers; contestants provide the questions) are created the same way as they are for regular shows, by the show's staff, though possibly with a bit of skew to try to trip the computer up.
2. The interpretation of the clues is really what the whole show is about, particularly, how Watson is able to interpret natural language with its subtleties and nuances ("I was struck by a bat" could refer to a baseball bat or a flying rodent; Watson can disambiguate that from the context, which is far beyond a Google search sort of thing). Clues often play word games and include puns, which give computers a hard time; this shows how well Watson can handle such things.
One other example from tonight was something roughly like "a bright red clown, or a silly person." Watson correctly replied "bozo." The first of those descriptions is clear enough, though could cover a lot of different clowns. "Bozo" as a term of derision is, of course, a very idiomatic quirk. Watson was able to draw on both and come up with the right response.
3. Watson has been fed a large volume of plain encyclopedic facts, but they also mined a huge volume of writing and speech, much of it over the Internet, with an emphasis on how people speak casually (for example, the way comments appear in a forum like this one. I'm sure Watson bogged down when it was fed posts from AManFromMars).
5. No; Watson was not connected to anything; it had to draw entirely from its acquired knowledge.
http://www.youtube.com/watch?v=d_yXV22O6n4&feature=relmfu has a very good review of all of this.
Re: Need more info - link would be good
You mean like the ones at the bottom of the article under 'Related Stories'?
Here's a link for you - www.specsavers.com
...it seems Watson works as well, or better than a human when it comes to general knowledge questions (although draws a lot more power when doing so). Just how many human brains in jars are there inside Watson's case?
Would you trust Watson?
The IBM infomercial talked about using Watson in medical scenarios.
Based on its final Jeopardy answer, this would be akin to...
"The symptom's are a sore throat and itchy eyes"
What is 'necrotizing fasciitis'.
Quick! Immediate emergency amputation of the left leg is indicated.
Personally I think this is a little unfair to the humans. These challenges should also include a 'volumetric rule' - the computer should only be allowed to take up the same volume as an average human, and fit entirely behind the Jeopardy lectern on the set. - not a massive chilled server room.
Interested to see what others think of this. Fair or not?
Puny human, life is not fair
I believe Watson is being exploited by IBM for public relations purposes, whilst Jeopardy is exploiting it for audience size. Watson will not profit from this exploitation in the least!
In any case, the Child of Watson will be much smaller and more powerful! Speak then of "fairness", human!
how good would it fare on 3,2,1?
Or Blockbusters, or that one with Michael Barrymore with all the telvisions on steps or bloody Goldenballs? My point being that human brains are still far more adaptable and Watson is still just a dedicated machine that happens to be good at one thing.
Watson's parsing skills really did seem to have let it down badly on the last question, although for me it was more interesting to see it choose a deliberately low stake when its confidence wasn't high. While this is one of the simplest things to program compared with the horrendous complexity of the natural language processing, it just made it seem a little more 'human' in that moment. Prior to that, it had been going for the high scoring questions more often than not and I was half expecting it to blow the whole lot on the final question.
I also loved the part where Watson was asked how much to gamble on the Daily Double and it came back with the ridiculously precise figure of $6,435. I've no idea where that figure came from. It's about 44.08% of Watson's pot at the time ($14,600) but why 44.08%? Was it based on the the likely difficulty of the upcoming question (it was a $1,600 question) or just 50% give or take a random value? It's fascinating.
Cynics have criticised the Watson project for all sorts of crazy reasons, from the choice of 'trivial' Jeopardy in the first place right through to the idea that it's somehow cheating because it's a room full of servers. Others have said it's no more impressive than a chess computer.
All of which is missing the point entirely. Chess is essentially a mathematical problem, albeit an extraordinarily complex one. Early chess programs were impressive for their time but could be beaten easily by even a marginally competent human player. And even though people with an understanding of mathematics and computing knew that machines are fundamentally better at number-crunching and deep analysis, and that it was only a matter of time before a computer beat a human at chess, it still took decades of research and insane amounts of funding before Deep Blue beat Kasparov.
On the other hand Watson is the first serious attempt at making a machine that can take on humans in a domain traditionally considered easy for people and next to impossible for computers. The very first attempt. At something many said was not just difficult, but simply not doable. And it's more or less wiping the floor with the human players. Where is this going to lead in 20-30 years?
Of course it's not machine intelligence, and this type of research may yet prove to be a dead end (or at least of relatively limited use) when it comes to achieving true AI, assuming such a thing is even possible. But as an example of making a machine APPEAR to be intelligent, this is the single most impressive thing I've seen. Ever. I've watched the last two shows unable to remove the silly, shit-eating grin from my face. This is a game changer, make no mistake.
It's just a pity Arthur Clarke isn't still with us to enjoy this. He'd have absolutely loved it.
We all know that Watson is run like the Wizard of Oz.. except by a multitude of boffins behind the curtain.
The robots aren't ready to take over
In a sense, Watson seems to have been given an unfair advantage. The computer has no finger to move, so it can avoid the delay before initiating movement, and the also the time to depress the button. That is sufficient to make it the winner.
A fairer test would be for the three contestants to write down the answers and then award based on correct or wrong, but that wouldn't be Jeopardy.
It also looks like the computer is good at "look-up" type problems. Again, it has been gamed to win, since we haven't seen ANY of the complex reasoning/verbal skills questions so beloved of Jeopardy.
Overall, this is very stage-managed. It is impressive as a performance, but it is a biased contest and somewhat misrepresentative of reality. Next time, let some computer-savvy people set up the contest.
re: Buzzer, Watson figures out his top 3 answers first, if one of them exceeds the threshold of confidence that is established for each question, he buzzes in. This is evident in the fact that he had come up with the right answers, but was beat to the buzzer by the meatbags, (Not my term, Todd Alan Crain revealed that to us at Lotusphere 2 weeks ago) more than once.
When he buzzes in, as mentioned, he triggers a mechanical process to push the same button as the meatbags. i.e. level playing field
What isn't generally know about Jeopardy! is that the contestants can't buzz in until the question has been read by Alex. Watson is fed the text of the question at the moment that Alex begins to read the question. A light goes out for the meatbags to indicate that they can buzz in once Alex completes reading the question, Watson gets that electronically. Buzzing in too early disables your button, hence you see a lot of meatbags frantically pressing the button and not being first, because they pressed too soon. As Watson gets that info electronically, he would (should) never buzz in too early.
Lastly, if he were to simply buzz in first and answer the question, he takes a big risk if he is wrong as he would be penalized.
Just because he lost 1 Final Jeopardy! question and there was "no buzzer" doesn't mean he will always lose a "no buzzer" question. I have seen him in action and he cleaned up, buzzer or not. Remember, there is no buzzer for Double Jeopardy! either and he got those.
His answer of Toronto was not an answer that he would have given in a buzzer round, He did not hit the confidence level required to buzz in. He would not have answered the question, but in Final Jeopardy! you must provide a question, so he gave his best answer, a guess, hence the ????'s
The programmers, speculate the answer was because "US Cities" was the category which they weight less than the Answer. Also, Toronto (Ontario) has an American League baseball team and there was no evidence to connect the airport(s) to WWII, so it had no confidence, but still ranked "Toronto" higher than "Chicago", his 2nd place answer. More here:
So far so good. I think IBM has shown they are on to something. Watch out Google, you may have some competition in the future...
Surely it wouldn't have been that hard to give Watson a mechanical actuator to physically press a the button. Granted this would only cost it a few milliseconds, but it would definitely be fairer. Or has Jeopardy ever made similar adjustments for disabled human contestants? Say someone with no arms.
Press the button faster Ken!
In watching the show, it did look like Watson had a real speed advantage on ringing in first.
In a normal Jeopardy! game the buzzers are locked out until Trebek finishes reading the answer. I'm not sure how this is accomplished as you would likely need another human "turning on" the buttons, unless it is done electronically based on Trebek's mic. I also seem to recall there being a set delay before you can ring again if you are too early. According to Wackypedia they turn on a light to cue the contestants as to when they can ring in and have a .25 sec delay if they ring in early.
I do not know how this affects these games, especially since there is no audio interface on Watson. One of the reasons Ken Jennings was so successful for so long was that he had a fast reaction time and would usually beat the other players in ringing in. He would often ring in first then figure out the correct response when he had to (of course he was also pretty good at that).
From the Nova shows on Watson, (linked to in my comments on the article about first show), I do know that Watson was only supposed to ring in when it was reasonably sure of its response and he has a mechanical button pusher. The overlay of his his response calculation on the screen has a vertical bar which is where the threshold is supposed to be. I think they also were doing colour coding of both the avatar and the overlay of his answer confidence. It did look like he rang in and responded to at least one question where I thought it was below the threshold, so it may no longer be an absolute condition.
As far as Final Jeopardy goes, Watson may have had trouble parsing the term "US" in the category, and also "World War II" in the answer. In training runs it had real problems with roman numerals, so it would almost always fail on royalty answers. They were supposed to have fixed it so it would understand them and also not say "Elizabeth II" as "Elizabeth Eye Eye"
The ambiguity of "US" likely affected its confidence in the category and led to the very low bid.
BTW for those of you who are not Canadian, the 2nd largest airport in T.O. is Toronto Island Airport, the word "Island" is in a multitude of WW II battle names, and Lester B. Pearson, in addition to being the 14th Prime Minister of Canada (as mentioned in posts above) and the namesake of the larget TO ariport, was in the Army and later the RFC in WW I. So Watson's response, though pretty far out there, has a little bit of validation behind it, especially if it didn't get the WW II or US caveats.
The inclusion of that Final, which should easily be answered by most human Jeopardy players, as well as the "Etudes, Brute" which Watson spoke as Brute, like a dry champagne instead of a back-stabbing Roman, makes me think the staff is setting up at least a few of the categories to get their digs in for the humans.
One of the other things I noticed was that Watson seems to "hunt" for the Daily Doubles, which is probably another tactic that the developers noticed among the most successful players and included.
Looking forward to the final show where they play a complete game.