Google Deep Mind is taking Artificial Intelligence to a new level and hope to accelerate scientific progress and truly useful robotics

Demis Hassabis leads what is now called Google DeepMind. It is still headquartered in London and still has “solve intelligence” as its mission statement. Roughly 75 people strong at the time it joined Google, Hassabis has said he aimed to hire around 50 more. Around 75 percent of the group works on fundamental research. The rest form an “applied research team” that looks for opportunities to apply DeepMind’s techniques to existing Google products.

Over the next five years, DeepMind’s technology could be used to refine YouTube’s recommendations or improve the company’s mobile voice search.

They dream of creating “AI scientists” that could do things like generate and test new hypotheses about disease in the lab. When prodded, he also says that DeepMind’s software could also be useful to robotics, an area in which Google has recently invested heavily

DeepMind has combined deep learning with a technique called reinforcement learning, which is inspired by the work of animal psychologists such as B.F. Skinner. This led to software that learns by taking actions and receiving feedback on their effects, as humans or animals often do.

In 2013, DeepMind researchers showed off software that had learned to play three classic Atari games – Pong, Breakout and Enduro – better than an expert human. The software wasn’t programmed with any information on how to play; it was equipped only with access to the controls and the display, knowledge of the score, and an instinct to make that score as high as possible. The program became an expert gamer through trial and error.

No one had ever demonstrated software that could learn to master such a complex task from scratch.

Arxiv – Playing Atari with Deep Reinforcement Learning

It was the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. They applied their method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. They find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them.

Until DeepMind’s Atari demo, no one had built a system capable of learning anything nearly as complex as how to play a computer game, says Hassabis. One reason it was possible was a trick borrowed from his favorite area of the brain. Part of the Atari-playing software’s learning process involved replaying its past experiences over and over to try and extract the most accurate hints on what it should do in the future. “That’s something that we know the brain does,” says Hassabis. “When you go to sleep your hippocampus replays the memory of the day back to your cortex.”

In a 2007 study recognized by the journal Science as a “Breakthrough of the Year,” Hassabis showed that five patients suffering amnesia due to damage to the hippocampus struggled to imagine future events. It suggested that a part of the brain thought to be concerned only with the past is also crucial to planning for the future.

Google Translates Pictures Into Words

Google’s engineers have trained a machine learning system to translate pictures into words. It uses the same techniques they developed for language translation. The way this is done is very clever, and explained quite clearly in the article below.

While this is a very impressive achievement, what is really interesting about it is that the technique makes use vector mathematics to encode words and sentences in a language-independent way. For example:

“Google goes on to make an important assumption. This is that specific words have the same relationship to each other regardless of the language. For example, the vector “king – man + woman = queen” should hold true in all languages.”

What this means is that Google has, or can develop, a very large database of language-neutral related conceptual encodings that can be manipulated with vector mathematics. This appears to be very much the recoding technique used by the human brain but missing from the Deep Mind neural turning machine recently unveiled that I wrote about here just yesterday.

On top of that, the technique can take pictures or natural language as input.

This collection of technologies, plus others like Hierarchical Temporal Memory, is tantalizingly close to what I believe is necessary and sufficient for machine intelligence.

Other Deep Mind Paper

Arxiv – Recurrent Models of Visual Attention

Applying convolutional neural networks to large images is computationally expensive because the amount of computation scales linearly with the number of
image pixels. We present a novel recurrent neural network model that is capable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution. Like convolutional neural networks, the proposed model has a degree of translation invariance built-in, but the amount of computation it performs can be controlled independently of the input image size. While the model is non-differentiable, it can be trained using reinforcement learning methods to learn task-specific policies. We evaluate our model on several image classification tasks, where it significantly outperforms a convolutional neural network baseline on cluttered images, and on a dynamic visual control problem, where it learns to track a simple object without an explicit training signal for doing so.

They introduced a novel visual attention model that is formulated as a single recurrent neural network which takes a glimpse window as its input and uses the internal state of the network to select the next location to focus on as well as to generate control signals in a dynamic environment. Although the model is not differentiable, the proposed unified architecture is trained end-to-end from pixel inputs to actions using a policy gradient method. The model has several appealing properties. First, both the number of parameters and the amount of computation RAM performs can be controlled independently of the size of the input images. Second, the model is able to ignore clutter present in an image by centering its retina on the relevant regions. Our experiments show that RAM significantly outperforms a convolutional architecture with a comparable number of parameters on a cluttered object classification task. Additionally, the flexibility of our approach allows for a number of interesting extensions. For example, the network can be augmented with another action that allows it terminate at any time point and make a final classification decision. Our preliminary experiments show that this allows the network to learn to stop taking glimpses once it has enough in- formation to make a confident classification. The network can also be allowed to control the scale at which the retina samples the image allowing it to fit objects of different size in the fixed size retina. In both cases, the extra actions can be simply added to the action network f a and trained using the policy gradient procedure we have described. Given the encouraging results achieved by RAM, applying the model to large scale object recognition and video classification is a natural direction for future work.

SOURCES- Google Plus for Deep Mind, Arxiv, Technology Review