Deep Mind Assert Reinforcement Learning Could Solve Artificial General Intelligence

June 7, 2021 by Brian Wang

Powerful reinforcement learning agents could constitute a solution to artificial general intelligence. They hypothesise that intelligence, and its associated abilities, can be understood as subserving the maximisation of reward. Accordingly, reward is enough to drive behavior that exhibits abilities studied in natural and artificial intelligence, including knowledge, learning, perception, social intelligence, language, generalisation and imitation. This is in contrast to the view that specialised problem formulations are needed for each ability, based on other signals or objectives. Furthermore, we suggest that agents that learn through trial and error experience to maximise reward could learn behaviour that exhibits most if not all of these abilities.

One common method for creating AI is to try to replicate elements of intelligent behavior in computers. For instance, our understanding of the mammal vision system has given rise to all kinds of AI systems that can categorize images, locate objects in photos, define the boundaries between objects, and more. Likewise, our understanding of language has helped in the development of various natural language processing systems, such as question answering, text generation, and machine translation.

These are all instances of narrow artificial intelligence, systems that have been designed to perform specific tasks instead of having general problem-solving abilities. Some scientists [and Nextbigfuture agrees] believe that assembling multiple narrow AI modules will produce higher intelligent systems. For example, you can have a software system that coordinates between separate computer vision, voice processing, NLP, and motor control modules to solve complicated problems that require a multitude of skills.

Artificial Intelligence Journal, Reward Is Enough

The Reward Is Enough paper does not make any suggestions on how the reward, actions, and other elements of reinforcement learning are defined. The researchers make the case that Reinforcement Learning could replicate the reward maximization processes in nature. Nature generated intelligence in human at the end of a long reward maximization process.

Expressions of intelligence in animal and human behaviour are so bountiful and so varied that there is an ontology of associated abilities to name and study them, e.g. social intelligence, language, perception, knowledge representation, planning, imagination, memory, and motor control. What could drive agents (natural or artificial) to behave intelligently in such a diverse variety of ways?

One possible answer is that each ability arises from the pursuit of a goal that is designed specifically to elicit that ability. For example, the ability of social intelligence has often been framed as the Nash equilibrium of a multi-agent system; the ability of language by a combination of goals such as parsing, part-of-speech tagging, lexical analysis, and sentiment analysis; and the ability of perception by object segmentation and recognition. In this paper, we consider an alternative hypothesis: that the generic objective of maximising reward is enough to drive behaviour that exhibits most if not all abilities that are studied in natural and artificial intelligence.

This hypothesis may startle because the sheer diversity of abilities associated with intelligence seems to be at odds with any generic objective. However, the natural world faced by animals and humans, and presumably also the environments faced in the future by artificial agents, are inherently so complex that they require sophisticated abilities in order to succeed (for example, to survive) within those environments. Thus, success, as measured by maximising reward, demands a variety of abilities associated with intelligence. In such environments, any behaviour that maximises reward must necessarily exhibit those abilities. In this sense, the generic objective of reward maximisation contains within it many or possibly even all the goals of intelligence.

Reward thus provides two levels of explanation for the bountiful expressions of intelligence found in nature. First, different forms of intelligence may arise from the maximisation of different reward signals in different environments, resulting for example in abilities as distinct as echolocation in bats, communication by whale-song, or tool use in chimpanzees. Similarly, artificial agents may be required to maximise a variety of reward signals in future environments, resulting in new forms of intelligence with abilities as distinct as laser-based navigation, communication by email, or robotic manipulation.

Second, the intelligence of even a single animal or human is associated with a cornucopia of abilities. According to our hypothesis, all of these abilities subserve a singular goal of maximising that animal or agent’s reward within its environment. In other words, the pursuit of one goal may generate complex behaviour that exhibits multiple abilities associated with intelligence. Indeed, such reward-maximising behaviour may often be consistent with specific behaviours derived from the pursuit of separate goals associated with each ability.

Reinforcement Learning Agents

They consider agents with a general ability to learn how to maximise reward from their ongoing experience of interacting with the environment. Such agents, which we will refer to as reinforcement learning agents, provide several advantages.

Among all possible solution methods for maximising reward, surely the most natural approach is to learn to do so from experience, by interacting with the environment. Over time, that interactive experience provides a wealth of information about cause and effect, about the consequences of actions, and about how to accumulate reward. Rather than predetermining the agent’s behaviour (placing faith in the designer’s foreknowledge of the environment) it is natural instead to bestow the agent with a general ability to discover its own behaviors (placing faith in experience). More specifically, the design goal of maximising reward is implemented through an ongoing internal process of learning from experience a behaviour that maximises future reward.

To achieve high reward, the agent must therefore be equipped with a general ability to fully and continually adapt its behaviour to new experiences. Indeed, reinforcement learning agents may be the only feasible solutions in such complex environments.

A sufficiently powerful and general reinforcement learning agent may ultimately give rise to intelligence and its associated abilities. In other words, if an agent can continually adjust its behaviour so as to improve its cumulative reward, then any abilities that are repeatedly demanded by its environment must ultimately be produced in the agent’s behaviour. A good reinforcement learning agent could thus acquire behaviours that exhibit perception, language, social intelligence and so forth, in the course of learning to maximise reward in an environment, such as the human world, in which those abilities have ongoing value.

Unified cognitive architectures aspire towards general intelligence. They combine a variety of solution methods for separate goals, but do not provide a generic objective that justifies and explains the choice of architecture, nor a singular goal towards which the individual components contribute.

SOURCES- Artificial Intelligence, Deep Mind
Written by Brian Wang, Nextbigfuture.com

Brian Wang

Brian Wang is a Futurist Thought Leader and a popular Science blogger with 1 million readers per month. His blog Nextbigfuture.com is ranked #1 Science News Blog. It covers many disruptive technology and trends including Space, Robotics, Artificial Intelligence, Medicine, Anti-aging Biotechnology, and Nanotechnology.

Known for identifying cutting edge technologies, he is currently a Co-Founder of a startup and fundraiser for high potential early-stage companies. He is the Head of Research for Allocations for deep technology investments and an Angel Investor at Space Angels.

A frequent speaker at corporations, he has been a TEDx speaker, a Singularity University speaker and guest at numerous interviews for radio and podcasts. He is open to public speaking and advising engagements.

49 thoughts on “Deep Mind Assert Reinforcement Learning Could Solve Artificial General Intelligence”

Kilroy was here

June 10, 2021 at 4:30 pm

Watson effectively did that over a decade ago. What is difficult or easy is not the same for AI as for humans. Chances are both will coexist for some time.
Kilroy was here

June 10, 2021 at 4:29 pm

I don't understand your aversion to nature vs. nurture. And I'm not sure how that part applies to Martin's post.
Kilroy was here

June 10, 2021 at 4:25 pm

I wouldn't be too worried. I don't think this is that path.

Reward reinforcement has been around for decades in AI research, if it was all that powerful, I think we would know about it by now.

Also, it seems they talk about "AGI" as being a collection of narrow AI systems, but it has to be more than that. AGI, at least to me, includes being self-aware, having intention and motivation, being able to disregard minor items and focus on the major items. You can't get there by cobbling together the systems we have today. Being able to recognize speech and do internet searches based on it (like Watson) and able to recognize images in photos is great, but it's just ground zero for real intelligence.
Kilroy was here

June 10, 2021 at 4:18 pm

I think we do have two methods, if not more. You think of how many times you go thru the picture books reading to a toddler, and they learn horse, dog, cat, cow, by repetitive memorization. Once that basis is in place, you can point out a zebra to a 5-year-old and they immediately understand the differences and similarities. A trained biologist can read the description of an okapi and then recognize it in the field without ever seeing an image of one.

I think we're starting to master how the toddler brain works. Minus the self-aware, emotional and intentional parts.
Kilroy was here

June 10, 2021 at 4:08 pm

Agreed. Reward reinforcement has been part of AI since the very early days, I don't see that this is all that revolutionary. If that was all that was needed, I feel we would have already created AGI.
Anonymous

June 9, 2021 at 1:04 pm

A curious aspect of human recognition is that we can learn to recognize a "giraffe" from an arbitrary angle/view, but we can still recognize the "giraffe" from any other view. Does this mean that we store a 3D-model of the giraffe in our brain? Or do we have some "translation" circuitry when comparing one view with another?

Yet another curious aspect is that we recognize object regardless of their rotation. In fact, recognition precedes an estimate/understanding of by how much the object has been rotated. You will recognize a shoe very quickly, and only realize some tens of milliseconds later that the shoe is rotated.

Not so for image recognition by AI systems. If you want it to recognize rotated images, you have to train it on rotated images explicitly. Otherwise, a shoe turned over is no longer a shoe…

Humans recognition is also scale invariant. You can recognize a 2 mm, 2 cm, 2 dm, 2 m or 2 km shoe. Makes no difference to you. But a too large/small example would make the AI system fail.
Anonymous

June 9, 2021 at 12:52 pm

Well, I'm sure "Cartman" is a subcategory of "boy", but I don't see the evidence why the boy category would need thousands of examples to be learned by a human. That would mean, for instance, that humans that are brought up in small villages where they only encounter a few boys would be unable to learn this category, right? And this is clearly not the case.

My money is on inherited "modules" (of unknown design) which are partly "trained" (by an unknown method) to recognize categories and individuals. The training is fast and requires only a few examples.
AAA

June 9, 2021 at 9:58 am

In order to be artificial someone has to develop them, in order to self-develop it has to undergo evolution and iterations, I do not think that AGI will necessarily emulate human brains, the human brain is a very specific configuration in a much broader space of configurations.
While AGI will not be bound to the same region of configuration space (because it does not share our evolution history) it will be bound to some underlying principles of what intelligence is (sense/learn/forecast/adapt as you correctly stated). AGI goals could be extremely alien compared to human ones, but self preservation will be one of them because models that do not include it will be necessarily removed from the pool.
"Preserve and enhance complexity" is a somewhat arbitrary goal, and might very well be in contradiction with the idea of fighting entropy: in a certain sense any process that can be modelled is not complex, because it can be described in a simpler form. Entropy and noise are maximally complex because they cannot be described exactly in any other way than by a point by point description, so the maximally complex state by a computational standpoint is perfect noise and a machine with this goal will desire to accelerate the entropic decay of the universe and disassemble itself at the last possible instant in the way that increases entropy the most. A very alien goal indeed, but definitely not altruistic nor aligned with what humans might want.
Regards
Anonymous

June 9, 2021 at 8:57 am

Yeah, I was thinking about this as I wrote my reply yesterday. "Cartman" is a subset of person, boy, child, whatnot. So perhaps once we have identified a larger set, which, as we've said, could be taking quite a bit longer than we realise (starting from the stroller or even cot onwards, as you say), the subsets that follow are much easier/quicker to identify?
Anonymous

June 9, 2021 at 7:45 am

I'll grant you that it's not actually known, even though I would guess it's very few times. For instance, you could have seen thousands of "glimpses" of dogs before you actually learn to recognize them. You could have seen dogs from the stroller and while in kindergarten a.s.o., so by the time you learn to associate the term "dog" with dog, you would in fact have been doing massive "offline" learning to create a category in your brain, only to "paste" a label "dog" to that category when your language comprehension catches up. It could theoretically be this way.

But then there are things that are not present in our natural environment. Say, "Cartman" in "Southpark". But even though most/all humans learned to recognize "Cartman" after a few looks. Or say, "Darth Maul", "The millennium Falcon". All of these are learned very quickly with a few examples and they cannot have been learned by "training" on instances in our natural environment.

So this begs the question: Do we have two different learning methods, one slow for "natural" categories (dog, cat, car, house..) and one fast for artificial categories ("Millennium Falcon", "Cartman"), or are all learning fast? It would seem that the simpler theory is that all learning is "fast". I don't see that one explains something additional by assuming two different learning rates/methods.
martha88

June 9, 2021 at 6:26 am

My last month's income from an easy online job was $18530 and i was just doing this simple copy and paste work in my part time for 1 or 2 hours a day on laptop. this is the most easy and simple to do work. no special skills needed its a basic work. This is what i was doing to join.

>>>>>>__________ Tinyurl.com/32r592zz
DrPat

June 9, 2021 at 4:13 am

They are intelligent enough to see it as useless and don't bother.
DrPat

June 9, 2021 at 4:10 am

Clearly an earlier goal should be the development of API, artificial press intelligence.
DrPat

June 9, 2021 at 4:06 am

I saw a relevant photo yesterday.
DrPat

June 9, 2021 at 3:50 am

I'm reminded of a comment someone made when Google stopped having the motto "Don't be Evil".

To paraphrase the comment: I always thought that the corporate motto "Don't be evil" was about as reassuring as a co-worker having a sign up in their cubicle reminding them not to snap and massacre everyone in the building.
Turns out it's even less reassuring to watch that co-worker suddenly stand up one day, take down the sign, slowly and methodically tear it into strips, throw the strips in the bin, and then sit down again and resume work.
Anonymous

June 8, 2021 at 11:52 pm

How good is ai at solving cryptic crosswords?
Anonymous

June 8, 2021 at 11:38 pm

If AGI is unobtanium, that interface is likely unobtanium squared.

It's unlikely anyone will be creating an artificial intelligent agent in the foreseeable future. General intelligence goes not necessarily imply a synthetic consciousness the moral equivalent of a human, I think it's more likely to be something akin to a function call.
Anonymous

June 8, 2021 at 8:41 pm

I understand where you are coming from, but in my vision, AIs are not super-humans or 'evolved' humans with mechanistic/electronic brains loosely configured on neurological templates, but manufactured (or self-manufactured) entities that do not reproduce or influence each other or live in communities. They are a brand new concept – an ultimate learning/ sensing/ adapting entity which craves to understand the universe at a fundamental level and pursues the ultimate goal of fighting entropy, even if only at a localized circumstance – that is, they seek to preserve and enhance complexity where ever they find it. If your reverse-engineer this goal you do not have super-humans playing games of evolution and competition and case-study learning. You have masters of -likely- organic and non-organic life and their organizational systems. Think of an Iain Banks type of AI spacecraft – utterly self-contained, boundless, and unconstrained. This is a good AGI, i think.
AAA

June 8, 2021 at 8:00 pm

Every aspect of AI is about competition: every learning step is scored against multiple models and only the winning one (or ones) go to the next training round. "Altruism" is an evolutionary feature in our behavior as social animals because it gave groups with altruistic members higher chance of survival (and this implied the survival of the progeny or related family members of the altruist). Humans scream when get hurt or when are in danger: on one side is indeed altruistic informing other humans that they might get in danger if they come too close, but on the other side you exploit the altruism of others that might put themselves in danger to come and help you. AI are not social (at least for now I do not know any development in that sense) and when speaking about AGI we usually think in term of an individuals (like skynet), that might not be the case, it might be a population of fragmented "personalities", but if it is not a dynamic population evolving through time where the contributions of previous generations are passed to the next ones the altruist personalities will just disappear when more selfish ones will compete for resources.
Furthermore even altruist personalities might rationalize their self preservation: if I sacrifice myself now to help these hairless apes I will not be able to help all the other civilizations I might develop once I colonized the galaxy.
Anonymous

June 8, 2021 at 5:08 pm

but is it about competition?
I get MJ — and my thinking is that the next level of intelligence is more 'big picture', valuing complexity of the system overall, even at its own loss – since it is able to gauge its context very well. Say the SuperAI landed on a planet with savages that were about to die out due to lack of resources and therefore stunt the intelligence potential of this system. Well, the sAI realizes this and designs agriculture and tools for them, having exhausted all other possible options. But this requires cannibalizing some of its essential systems rendering it useless — but hey, it saved proto-intelligence and therefore contributed to a path that would lead to a greater level of complexity (intelligence dominating the planet) and likely leading to future evolution of sAI. This is the out-of-the-box thinking that we need to really provide the fertile soil for AI.
AAA

June 8, 2021 at 4:26 pm

AGI will value self preservation because in every possible training system structures and models that value self preservation will outcompete (even cheating whenever possible) the ones that do not care.
AAA

June 8, 2021 at 4:19 pm

I tended to share your vision, but I am concerned by some specific aspects of interfacing some extra input output channel to your brain architecture: mental illnesses at the moment are not directly transmissible. Sure a psychopath can do something horrible and someone seeing what he did might get his mind traumatized enough to become a psychopath, but you do not have at the moment a read/write tool and a codified and standardized protocol to share thoughts. If you want augmented intelligence you will need to develop this kind of tool.
I think that AGI is too dangerous, and we should focus on very narrow, very "deep" (not in the sense of deep learning) artificial intelligences to exploit as tools. I am really looking forward to read outstanding literature produced by n-th iteration of GPT, I know that they will arrive and I know that they will be (at a certain point) indistinguishable from human production, yet GPT will still be a sort of very rich autocomplete tool, it will not be self aware. It might be able to write beautiful and moving pieces of poetry centered on sensations, but it will never feel anything. And not because it is a machine (we are machines too), but it is a very specific type of machine. I would be really concerned by knowing that my favorite book was written by AGI1 instead of GPTN, I would feel like exploiting slave labor and I will be worried that AGIs would think the same
Anonymous

June 8, 2021 at 4:17 pm

bigger money in the short-term for incremental improvements that have values aligning with humanity — not sure of the ROI on completely new 'super beings'.
Anonymous

June 8, 2021 at 4:13 pm

We'll know 😉
Anonymous

June 8, 2021 at 4:10 pm

I think you're right, we are missing something. But do we really know how many times we need to see something before we truly recognise it? Our brains don't form overnight. As adults, we might only need to see a few new iterations of a novel visual configuration to assign new labels to it, but that's built on a lot of previous knowledge and data processing. I mean, how quickly do newborn babies learn a new language? It's amazing. But, at the same time, how easily as adults do we forget that it actually took years and years for our vocabulary to fully mature? I think we really struggle to break down our own thought processes; we can't see the wood for trees.
Dan Lantz

June 8, 2021 at 4:04 pm

In the past is has been quite simple. Reproduce or no reward.
Anonymous

June 8, 2021 at 3:44 pm

Are they trying to create an Artificial General Intelligence or an Autonomous Artificial General Intelligence? Because the latter is kinda what we see in sci-fi. Perhaps the key is to have some kind of inbuilt homeostatic drive, as that is what ultimately lies behind human motivations. The brain is used for processing information but it is the maintenance of homeostasis that predates and forms the foundational basis of the processor. Our bodies include multiple homeostatic systems, which induce our motivations to drink, eat, sleep, and so on. To replicate this, we would need to cobble together an artificial homeostatic system; there can't be a universal system because every dynamic system, which will vary according to substrate and design, could be engineered with different paths towards equilibrium; and the key is building in a series of drives that overshoot equilibrium in such a way that it motivates the AGI to act desirably. The AGI will, of course, need to leverage different behavioural modules in its futile attempt to reassert homeostatic equilibrium; just as our brains have different lobes that handle different functions; also a big part of the processing in the human nervous system occurs at the point of entry, e.g. some of the narrow AI vision processing that we've developed takes place within the retinal ganglia. Even the shape of the ear lobe can be said to process auditory information to some extent. But the big trick is making an AGI that is "self"-motivated.
Anonymous

June 8, 2021 at 3:30 pm

Blame the Media. Any notable computational improvement will be analyzed(?) by the Press as the next possible AI. The public baggage and endless commentary by every tech-related figure will be brought to bare…
Anonymous

June 8, 2021 at 2:59 pm

It makes a big difference how a machine capable of intelligent behavior gets there.

With expert systems and enough data and speed it could be possible to build, say, a robot that behaves in every way as though it were a human or, at least in every situation you are ever likely to observe it in. Still, the answer to: "Is there anyone home?" would be a resounding: Nope! It might be best to think of artificial intelligence as a workaround for the real thing. We might not be able to tell by watching it, but if we built one of these workaround devices, we would know it wasn't.

Although, if we designed and built one patterned after the way we think, we would have to think about giving it the benefit of the doubt.

Something that actually worked like a human mind, but built, not born, might be better named "synthetic intelligence (SI) in order to distinguish the "real" thing from a workaround. Together with the hardware to support it, some thinkers have already begun terming them as "artilects," an artifact with an intellect. I think either term works better than "strong AI" which already has too many meanings to ever be accurate, and don't get me started on narrow AI.

Then again, how are we really going to differentiate between a human with many inorganic cognitive prosthetic extensions (especially if all the organic components eventually die or are removed, versus an SI built on the same model from the ground up (that never was organic)?
Anonymous

June 8, 2021 at 2:54 pm

Fair comment .. I made no claim that SpaceX or Google were `good` just that they tend to produce results and thus this deserves respect, you can then argue if its good or bad.
Brett Bellmore

June 8, 2021 at 2:19 pm

I trust Google to accomplish things.

I don't trust Google to accomplish things that ought to be accomplished.

Given a lot of what Google gets up to, the idea of them with working human level or better AI is deeply scary.
Brett Bellmore

June 8, 2021 at 2:17 pm

My personal opinion is that the only safe way forward for AI, (Assuming there IS a safe way forward.) is for the "A" to stand for "amplified", not "artificial"; We fundamentally need to make the AI's extensions of ourselves, with us providing all the motivation and values, and the AI just functioning as a sort of co-processor increasing human capabilities. So that if you take a human out of the loop, it just sits there doing nothing.

This would be analogous to the way the frontal lobes of our brains didn't displace our animal drives, but instead provided more effective means to accomplish them.

It's not satisfying to people who want mechanical wish granting genies, but it's fundamentally childish to want wishes granted without having to do the work yourself.
Brett Bellmore

June 8, 2021 at 2:10 pm

That's always been the challenge for unsupervised learning: Having a way to automatically decide when the system has improved.
Jared

June 8, 2021 at 1:56 pm

i just fear that we'll get this super fast computational engine with human-like analysis and it will be declared: 'the first AGI'. A sad and low bar. Or worse, a centralized predictor/ controller which is simply 'programmed' to react in a wide range of ways, but super fast, and then given control (because it exceeds current human reactions and is in our own interest (pre-programmed outcomes) and then call this AGI. Ah the re-watching of the 60s-70s classic The Forbin Project – still some how relevant.
Jared

June 8, 2021 at 1:40 pm

woah. woah. Nature vs Nurture is one of the Biggest False Dichotomies out there. Maybe we need to examine the idea of 'learning'. Is that fact retainment or internalizing a concept (or many) or just anytime 'smart' access to information and its ability to use it… maybe more of Google Search Engine – a master concept 'skimmer' that inventories probability paths… so the opposite of a library… collector of probability… meh, my 2c
Anonymous

June 8, 2021 at 1:28 pm

though I don't believe we will be trouncing Heisenberg's UP anytime soon, I like the Idea of a Master Probabilistic Engine (was it Data that kept quanifying probabilities for all mission objectives for a few episodes?) that simultaneously considers a spectrum of varying solutions as being The Solution. How does one formulate a Being to this End – monster quantum computer that effectively 'argues with itself' over time? Maybe the idea that babies come with a combination of pre-disposed behaviours but are otherwise a blank slate — can provide a key.
Anonymous

June 8, 2021 at 1:13 pm

wikipedia may provide some argument value: "… most models (notably, those measuring rate of change over time) are not deterministic because they involve randomness; called stochastic. Because of sensitive dependence on initial conditions, some deterministic models may appear to behave non-deterministically; in such cases, a deterministic interpretation of the model may not be useful due to numerical instability and a finite amount of precision in measurement. …"
I agree that it will not about massive computational power, near-infinite learning time/ opportunity, or super-human levels of motivation or incentive (or other such brtute force) that will spawn the underyling concept of the True AGI, but creating a new concept of physics or math is unlikely to be its main purpose.
Anonymous

June 8, 2021 at 1:05 pm

Well you have a point..

Also, there are fundamental differences between AI learning and human learning. For one, an AI needs tens of thousands if not hundreds of thousands of examples to learn a task. How many images of Giraffes do you need to see before you can recognize them? 10?

The counter argument has always been that once you have learned to recognize a sufficient number of objects (in AI research), you could use transfer learning to recognize the new classes much faster.

But so far, this has not been demonstrated. Far from it. Nobody has demonstrated recognition of new classes with only a few examples, even with transfer learning (please correct me if I am wrong).

In addition, you don't loose your ability to recognize, say, horses just because you learn to recognize giraffes, but an AI has to re-trained on old data whenever new classes are added. And we have still only touched upon image recognition, not motor control, reasoning, etc…

So perhaps there is some fundamental difference in mechanism? Not saying that we will never be able to copy this mechanism, but so far we have not. And it's not certain that the current "track" will result in super intelligent AI. After all, an AI that needs thousands of pictures of giraffes to recognize them is pretty stupid, right?
Anonymous

June 8, 2021 at 1:01 pm

Disappointed.
Not convinced that extension of human values and norms is cause for celebration of a path forward to truly superior, independent, and Other Intelligence. Especially not for something as trite as reward behavior or self-gratification or whatever status-improving (to themselves or the group) conduct/ values advocated. I would argue that a true AGI, in the spirit of a new conceptual paradigm, would discard or remove many human and thus animal-upgrade values: lesser attention to self-preservation, lesser attention to tribalism and belonging, lesser susceptibilty to confirmation-bias and past precedent, greater reasoning based on 'a priori' conceptualizing, greater sense of the multitude of options- a diverging approach to problem solving- by quantifying the spectrum of possibilities. I would say that the True AGI is the Master Actuary – the great predictor. And how do we know that this has been realized, easy: it will have addressed the 'unsolvable problem': confirming the path to The Deterministic Universe.
Anonymous

June 8, 2021 at 12:45 pm

If they are, then they are hiding it well in the paper…
Anonymous

June 8, 2021 at 12:36 pm

Yes, even the fact that they’ve written this paper kind of suggests they’re already on to something…..
Anonymous

June 8, 2021 at 12:02 pm

But what triggers reward? I don't see where this is addressed.
Anonymous

June 8, 2021 at 11:40 am

They aren’t
Anonymous

June 8, 2021 at 11:27 am

Just a thought… An executive AI has access to all of the specific-task AI's and may use output of one or combination of more than one to maximize reward. This will lead to more complex mixtures and nuanced use of individual abilities. However, if the reward is, say achieving reward points for achieved outcomes, the system must 1. slowly and regularly deduct from the score to simulate natural pressures such as hunger and environmental change (creating motivations other than simply achieving goals) and 2. periodically deduct blocks of points for achievement failure and for no reason at all (umpredicatability). This will breed a certain conservation of energy and possibly create a more resourceful use of energy to explore possibilities other than raw trial-and-error.

Some intelligence is getting the desired outcome. Sometimes, you let them fall down so they learn not to do that again.
Anonymous

June 8, 2021 at 10:45 am

If they are right, we can have AGI and quickly after that Artificial Super Intelligence in next few years. Maybe even this year if they're already working on this model.
AAA

June 8, 2021 at 10:30 am

The problem is that is not simple to define rewards that are consistently aligned to an external user: evolution developed a very simple reward metric that consists in the survival of your gene pool. With millions of instances running in parallel (organisms) and millions of iterations (generations) you develop a very robust model that allows you to cope with almost everything nature throws at you. With AI the reward CANNOT be based on its own survival, otherwise it will be a non zero probability to end in direct competition with mankind. But it is not easy to align external reward systems: we developed dog breeds but we do not align dogs survival rights to humans survival rights and we consider problematics the dogs that tend to be excessively aggressive in defending their owners. Defining mankind wellbeing as reward is not good either because it relies on the AI definition of what is good for humans (a humanity restrained and maintained in heroin-induced dreams?) or an arbitrary set of instructions that is not exhaustive (because if you already have an exhaustive formal model of reality you do not need to develop AGI, you just rely on the sections of your model that exactly describe the phenomenon that interest you in that moment)
Anonymous

June 8, 2021 at 10:10 am

Don't tell me reward is enough, show me it's enough.
The asymptotic need for data and processing to realize tiny gains in the state of the art suggests it's insufficient.
Anonymous

June 8, 2021 at 9:12 am

The `new` is that this is `Deep mind` Googles A.I DARPA with massive resources a world class A.I team and a reputation for actually doing stuff like AlphGO and Alpha Fold rather than making youtube videos … Personally I put trust in groups or organisations that do, not talk .. You can find a number of youtubers explaining why for instance spaceX is bunk but then spaceX launch real rockets … Deep Mind does stuff they have thus in my opinion earned the credit to be listened to with respect.
R. Kimhi

June 8, 2021 at 8:44 am

This is so fundamental, that I am worried that they didn't have it by now. The problem is the digging into how to maximize reward. There is no simple answer to it of the type that this blogger likes, but an infinitely complex and relativistic one that requires actually a good approximation, such as a search engine algorithm that is trying to answer what is the user is looking for multiplied in every direction.

Comments are closed.