Ray Kurzweil joined Google to develop a truly intelligent computer ( one that could understand language and then make inferences and decisions on its own.). It will require nothing less than Google-scale data and computing power.
Kurzweil was attracted not just by Google’s computing resources but also by the startling progress the company has made in a branch of AI called deep learning. Deep-learning software attempts to mimic the activity in layers of neurons in the neocortex, the wrinkly 80 percent of the brain where thinking occurs. The software learns, in a very real sense, to recognize patterns in digital representations of sounds, images, and other data.
They are producing remarkable advances in speech and image recognition. Last June, a Google deep-learning system that had been shown 10 million images from YouTube videos proved almost twice as good as any previous image recognition effort at identifying objects such as cats. Google also used the technology to cut the error rate on speech recognition in its latest Android mobile software.
Last June, Google demonstrated one of the largest neural networks yet, with more than a billion connections. A team led by Stanford computer science professor Andrew Ng and Google Fellow Jeff Dean showed the system images from 10 million randomly selected YouTube videos. One simulated neuron in the software model fixated on images of cats. Others focused on human faces, yellow flowers, and other objects. And thanks to the power of deep learning, the system identified these discrete objects even though no humans had ever defined or labeled them.
What stunned some AI experts, though, was the magnitude of improvement in image recognition. The system correctly categorized objects and themes in the YouTube images 16 percent of the time. That might not sound impressive, but it was 70 percent better than previous methods. And, Dean notes, there were 22,000 categories to choose from; correctly slotting objects into some of them required, for example, distinguishing between two similar varieties of skate fish. That would have been challenging even for most humans. When the system was asked to sort the images into 1,000 more general categories, the accuracy rate jumped above 50 percent.
Training the many layers of virtual neurons in the experiment took 16,000 computer processors—the kind of computing infrastructure that Google has developed for its search engine and other services. At least 80 percent of the recent advances in AI can be attributed to the availability of more computer power, reckons Dileep George, cofounder of the machine-learning startup Vicarious.
There’s more to it than the sheer size of Google’s data centers, though. Deep learning has also benefited from the company’s method of splitting computing tasks among many machines so they can be done much more quickly. That’s a technology Dean helped develop earlier in his 14-year career at Google. It vastly speeds up the training of deep-learning neural networks as well, enabling Google to run larger networks and feed a lot more data to them.
Although Google is less than forthcoming about future applications, the prospects are intriguing. Clearly, better image search would help YouTube, for instance. And Dean says deep-learning models can use phoneme data from English to more quickly train systems to recognize the spoken sounds in other languages. It’s also likely that more sophisticated image recognition could make Google’s self-driving cars much better. Then there’s search and the ads that underwrite it. Both could see vast improvements from any technology that’s better and faster at recognizing what people are really looking for—maybe even before they realize it.
Kurzweil envisions a “cybernetic friend” that listens in on your phone conversations, reads your e-mail, and tracks your every move—if you let it, of course—so it can tell you things you want to know even before you ask.
Kurzweil isn’t focused solely on deep learning, though he says his approach to speech recognition is based on similar theories about how the brain works. He wants to model the actual meaning of words, phrases, and sentences, including ambiguities that usually trip up computers. “I have an idea in mind of a graphical way to represent the semantic meaning of language,” he says.
That in turn will require a more comprehensive way to graph the syntax of sentences. Google is already using this kind of analysis to improve grammar in translations. Natural-language understanding will also require computers to grasp what we humans think of as common-sense meaning. For that, Kurzweil will tap into the Knowledge Graph, Google’s catalogue of some 700 million topics, locations, people, and more, plus billions of relationships among them. It was introduced last year as a way to provide searchers with answers to their queries, not just links.
Finally, Kurzweil plans to apply deep-learning algorithms to help computers deal with the “soft boundaries and ambiguities in language.
Microsoft’s Peter Lee says there’s promising early research on potential uses of deep learning in machine vision—technologies that use imaging for applications such as industrial inspection and robot guidance. He also envisions personal sensors that deep neural networks could use to predict medical problems. And sensors throughout a city might feed deep-learning systems that could, for instance, predict where traffic jams might occur.