Google Deepmind team working on Neural Turing Machine

Arxiv – Google Deepmind researchers have extended the capabilities of neural networks by coupling them to external memory resources, which they can interact with by attentional processes. The combined system is analogous to a Turing Machine or Von Neumann architecture but is differentiable end-to-end, allowing it to be efficiently trained with gradient descent. Preliminary results demonstrate that Neural Turing Machines can infer simple algorithms such as copying, sorting, and associative recall from input and output examples.

A Neural Turing Machine (NTM) architecture contains two basic components: a neural network controller and a memory bank.

Neural Turing Machine Architecture. During each update cycle, the controller network receives inputs from an external environment and emits outputs in response. It also reads to and writes from a memory matrix via a set of parallel read and write heads. The dashed line indicates the division between the NTM circuit and the outside world.

Computer programs make use of three fundamental mechanisms: elementary operations (e.g., arithmetic operations), logical flow control (branching), and external memory, which can be written to and read from in the course of computation (Von Neumann, 1945). Despite its wide-ranging success in modelling complicated data, modern machine learning has largely neglected the use of logical flow control and external memory.

Recurrent neural networks (RNNs) stand out from other machine learning methods for their ability to learn and carry out complicated transformations of data over extended periods of time. Moreover, it is known that RNNs are Turing-Complete (Siegelmann and Sontag, 1995), and therefore have the capacity to simulate arbitrary procedures, if properly wired. Yet what is possible in principle is not always what is simple in practice. They therefore enrich the capabilities of standard recurrent networks to simplify the solution of algorithmic tasks. This enrichment is primarily via a large, addressable memory, so, by analogy to Turing’s enrichment of finite-state machines by an infinite memory tape, they dub the device a “Neural Turing Machine” (NTM). Unlike a Turing machine, an NTM is a differentiable computer that can be trained by gradient descent, yielding a practical mechanism for learning programs.

Machines to date have been missing one vital piece—external memory. Not in the traditional sense, of course, but in the sense that external memory can be used to store ideas or concepts that result from reconfiguration of neurons (learning).

One example would be where a collection of some nodes in a network together represent the idea of the game of basketball—the rules, the history, records made by noted players, etc., everything that it entails. External memory would mean storing the concept of a single word—basketball, the way it happens for us humans—when we hear the word we imagine players we rooted for, big games, or perhaps baskets we made as kids, and on and on. In this new effort, the researchers at DeepMind are trying to add that piece to a Neural Network to create a true real-world representation of a Turing Machine.

The team reports that they are making progress—they have all the pieces—a neural network, input/output and of course that external memory piece. They also report that the machine works when applied in very simple ways, and impressively, is able to outperform regular neural networks in several instances.

Chunks of Information

[Technology Review] the neural Turing machine learns to copy sequences of lengths up to 20 more or less perfectly. And it then copies sequences of lengths 30 and 50 with very few mistakes. For a sequence of length 120, errors begin to creep in, including one error in which a single term is duplicated and so pushes all of the following terms one step back. “Despite being subjectively close to a correct copy, this leads to a high loss,” say the team.

Consider the following sentence: “This book is a thrilling read with a complex plot and lifelike characters.”

This sentence consists of around seven chunks of information and is clearly manageable for any ordinary reader.

By contrast, try this sentence: “This book about the Roman Empire during the first years of Augustus Caesar’s reign at the end of the Roman Republic, describes the events following the bloody Battle of Actium in 31 BC when the young emperor defeated Mark Antony and Cleopatra by comprehensively outmaneuvering them in a major naval engagement.”

This sentence contains at least 20 chunks. So if you found it more difficult to read, that shouldn’t be a surprise. The human brain has trouble holding this many chunks in its working memory.

Experiments demonstrate that [the Deepmind Neural Turing Machine] is capable of learning simple algorithms from example data and of using these algorithms to generalize well outside its training regime.

Recoding to handle complexity

The human brain performs a clever trick to make sense of complex arguments. An interesting question that follows from Miller’s early work is this: if our working memory is only capable of handling seven chunks, how do we make sense of complex arguments in books, for example, that consists of thousands or tens of thousands of chunks?

Miller’s answer is that the brain uses a trick known as a recoding. Let’s go back to our example of the book and add another sentence: “This book is a thrilling read with a complex plot and lifelike characters. It is clearly worth the cover price.”

Once you have read and understood the first sentence, your brain stores those seven chunks in a way that is available as a single chunk in the next sentence. In this second sentence, the pronoun “it” is this single chunk. Our brain automatically knows that “it” means: “the book that is a thrilling read with a complex plot and lifelike characters.” It has recoded the seven earlier chunks into a single chunk.

To Miller, the brain’s ability to recode in this way was one of the keys to artificial intelligence. He believed that until a computer could reproduce this ability, it could never match the performance of the human brain.

Arxiv – Learning to Execute

Recurrent Neural Networks (RNNs) with Long-Short Term Memory units (LSTM) are widely used because they are expressive and are easy to train. Our interest lies in empirically evaluating the expressiveness and the learnability of LSTMs by training them to evaluate short computer programs, a problem that has traditionally been viewed as too complex for neural networks. They consider a simple class of programs that can be evaluated with a single left-to-right pass using constant memory. Their main result is that LSTMs can learn to map the character-level representations of such programs to their correct outputs. Notably, it was necessary to use curriculum learning, and while conventional curriculum learning proved ineffective, we developed an new variant of curriculum learning that improved our networks’ performance in all experimental conditions.

SOURCES – Arxiv, New Scientist, Technology Review