There has been a lot of new reports that the chatbot Eugene Goostman has “beaten the Turing test” — the classic test of machine intelligence proposed by AI pioneer Alan Turing, which says (loosely) that if an AI program can fool people into thinking it’s human, in a textual conversation context, then it should be assumed to have human-level general intelligence.
Ben Goertzel is an Artificial Intelligence Expert. Here are highlights from his write up at Hplus Magazine.
In 2008, the chatbot Elbot convinced 30% of the Loebner Prize judges it was human.
Alan Turing somewhat arbitrarily set the threshold at 30% when he articulated his “imitation game” test back in 1950. Elbot almost met the criterion, but Eugene Goostmans beat it.
On the other hand, in the 2013 Loebner contest, no chatbot fooled any of the 4 judges. However, I [Ben Goertzel] suspect the 2013 Loebner chatbots were better than the 2008 Loebner chatbots, and the judges were just less naive in 2013 than 2008. And — I’m just guessing here — but I suspect the judges for the Eugene Goostman test were more on the naive side…
I [Ben Goertzel] doubt there has actually been any dramatic recent advance in chatbot technology. The fluctuation from 30% judges fooled in 2008 to 33% judges fooled in 2014 seems to me more likely to be “noise” resultant from differences in the panels of judges…
The 30% threshold for “passing” is far from universally accepted. For instance, in Ray Kurzweil’s bet with Mitch Kapor that no AI will pass the Turing Test by 2029, the definition of “beating the Turing Test” was set at fooling at least 2/3 of the judges, not just 30%. Also, importantly, the Kurzweil/Kapor bet requires a two hour conversation, not just five minutes like the test with Goostmans. A two hour conversation would be much harder to finesse with trickery.
In any case, while making chatbots that can fool human judges is a reasonably fun pursuit, nobody should confuse it with the quest to actually build thinking machines that can understand and converse like people.
Chatbots are theatrical constructs, which generate responses that simulate understanding, but don’t actually understand what they’re talking about.
An automated dialogue system that understood what it was talking about would not necessarily be a human-like general intelligence. But unlike the current batch of chatbots, it would be an important achievement, and would certainly have a lot to teach us about how to achieve AGI at the human level and beyond.
Turing was a very smart man, and a brilliant AI theorist for his time. But he may not have fully understood how easy people are to fool — nor how clever some humans are at figuring out how to fool other humans. (A computer that could fool humans as well as other humans can — now that would be impressive!!) Being able to fool ordinary people acting as judges is not the same as actually conversing in the same way that a human does.
An interesting modification of the Turing test would be as follows: If an AI could carry out conversations with a variety of AI experts, in a manner that other AI expert and linguist analysts could not distinguish from human conversations with those same AI experts. An AI that could do this probably would have human-level intelligence. The difference here is that AI experts would know to probe the likely weaknesses of chatbots; and the expert analysts would know to look for evasive maneuvers and other clever “stage magic” type ruses.