Deep Mind Assert Reinforcement Learning Could Solve Artificial General Intelligence

Powerful reinforcement learning agents could constitute a solution to artificial general intelligence. They hypothesise that intelligence, and its associated abilities, can be understood as subserving the maximisation of reward. Accordingly, reward is enough to drive behavior that exhibits abilities studied in natural and artificial intelligence, including knowledge, learning, perception, social intelligence, language, generalisation and imitation. This is in contrast to the view that specialised problem formulations are needed for each ability, based on other signals or objectives. Furthermore, we suggest that agents that learn through trial and error experience to maximise reward could learn behaviour that exhibits most if not all of these abilities.

One common method for creating AI is to try to replicate elements of intelligent behavior in computers. For instance, our understanding of the mammal vision system has given rise to all kinds of AI systems that can categorize images, locate objects in photos, define the boundaries between objects, and more. Likewise, our understanding of language has helped in the development of various natural language processing systems, such as question answering, text generation, and machine translation.

These are all instances of narrow artificial intelligence, systems that have been designed to perform specific tasks instead of having general problem-solving abilities. Some scientists [and Nextbigfuture agrees] believe that assembling multiple narrow AI modules will produce higher intelligent systems. For example, you can have a software system that coordinates between separate computer vision, voice processing, NLP, and motor control modules to solve complicated problems that require a multitude of skills.

Artificial Intelligence Journal, Reward Is Enough

The Reward Is Enough paper does not make any suggestions on how the reward, actions, and other elements of reinforcement learning are defined. The researchers make the case that Reinforcement Learning could replicate the reward maximization processes in nature. Nature generated intelligence in human at the end of a long reward maximization process.

Expressions of intelligence in animal and human behaviour are so bountiful and so varied that there is an ontology of associated abilities to name and study them, e.g. social intelligence, language, perception, knowledge representation, planning, imagination, memory, and motor control. What could drive agents (natural or artificial) to behave intelligently in such a diverse variety of ways?

One possible answer is that each ability arises from the pursuit of a goal that is designed specifically to elicit that ability. For example, the ability of social intelligence has often been framed as the Nash equilibrium of a multi-agent system; the ability of language by a combination of goals such as parsing, part-of-speech tagging, lexical analysis, and sentiment analysis; and the ability of perception by object segmentation and recognition. In this paper, we consider an alternative hypothesis: that the generic objective of maximising reward is enough to drive behaviour that exhibits most if not all abilities that are studied in natural and artificial intelligence.

This hypothesis may startle because the sheer diversity of abilities associated with intelligence seems to be at odds with any generic objective. However, the natural world faced by animals and humans, and presumably also the environments faced in the future by artificial agents, are inherently so complex that they require sophisticated abilities in order to succeed (for example, to survive) within those environments. Thus, success, as measured by maximising reward, demands a variety of abilities associated with intelligence. In such environments, any behaviour that maximises reward must necessarily exhibit those abilities. In this sense, the generic objective of reward maximisation contains within it many or possibly even all the goals of intelligence.

Reward thus provides two levels of explanation for the bountiful expressions of intelligence found in nature. First, different forms of intelligence may arise from the maximisation of different reward signals in different environments, resulting for example in abilities as distinct as echolocation in bats, communication by whale-song, or tool use in chimpanzees. Similarly, artificial agents may be required to maximise a variety of reward signals in future environments, resulting in new forms of intelligence with abilities as distinct as laser-based navigation, communication by email, or robotic manipulation.

Second, the intelligence of even a single animal or human is associated with a cornucopia of abilities. According to our hypothesis, all of these abilities subserve a singular goal of maximising that animal or agent’s reward within its environment. In other words, the pursuit of one goal may generate complex behaviour that exhibits multiple abilities associated with intelligence. Indeed, such reward-maximising behaviour may often be consistent with specific behaviours derived from the pursuit of separate goals associated with each ability.

Reinforcement Learning Agents

They consider agents with a general ability to learn how to maximise reward from their ongoing experience of interacting with the environment. Such agents, which we will refer to as reinforcement learning agents, provide several advantages.

Among all possible solution methods for maximising reward, surely the most natural approach is to learn to do so from experience, by interacting with the environment. Over time, that interactive experience provides a wealth of information about cause and effect, about the consequences of actions, and about how to accumulate reward. Rather than predetermining the agent’s behaviour (placing faith in the designer’s foreknowledge of the environment) it is natural instead to bestow the agent with a general ability to discover its own behaviors (placing faith in experience). More specifically, the design goal of maximising reward is implemented through an ongoing internal process of learning from experience a behaviour that maximises future reward.

To achieve high reward, the agent must therefore be equipped with a general ability to fully and continually adapt its behaviour to new experiences. Indeed, reinforcement learning agents may be the only feasible solutions in such complex environments.

A sufficiently powerful and general reinforcement learning agent may ultimately give rise to intelligence and its associated abilities. In other words, if an agent can continually adjust its behaviour so as to improve its cumulative reward, then any abilities that are repeatedly demanded by its environment must ultimately be produced in the agent’s behaviour. A good reinforcement learning agent could thus acquire behaviours that exhibit perception, language, social intelligence and so forth, in the course of learning to maximise reward in an environment, such as the human world, in which those abilities have ongoing value.

Unified cognitive architectures aspire towards general intelligence. They combine a variety of solution methods for separate goals, but do not provide a generic objective that justifies and explains the choice of architecture, nor a singular goal towards which the individual components contribute.

SOURCES- Artificial Intelligence, Deep Mind
Written by Brian Wang, Nextbigfuture.com