Since age 15 or so, the main goal of professor Jürgen Schmidhuber has been to build a self-improving Artificial Intelligence (AI) smarter than himself, then retire. His lab’s Deep Learning Neural Networks (NNs) (since 1991) and Long Short-Term Memory have transformed machine learning and AI, Deep Learning since 1991 – Winning Contests in Pattern Recognition and Sequence Learning Through Fast and Deep / Recurrent Neural Networks and are now (2017) available to billions of users through the world’s most valuable public companies including Google, Apple, Microsoft, Amazon, etc. In 2011, his team was the first to win official computer vision contests through deep NNs, with superhuman performance. His research group also established the field of mathematically rigorous universal AI and recursive self-improvement in universal problem solvers that learn to learn (since 1987).
He predicts trillions of AI in the 2050s will mine and develop the asteroids.
He has a long list of “truths” that many disagree with.
1. Many think that intelligence is this awesome, infinitely complex thing. Juergen think it is just the product of a few principles that will be considered very simple in hindsight, so simple that even kids will be able to understand and build intelligent, continually learning, more and more general problem solvers.
Partial justification of this belief:
(a) there already exist blueprints of universal problem solvers developed in my lab, in the new millennium, which are theoretically optimal in some abstract sense although they consist of just a few formulas (http://people.idsia.ch/~juergen/unilearn.html, http://people.idsia.ch/~juergen/goedelmachine.html).
(b) The principles of our less universal, but still rather general, very practical, program-learning recurrent neural networks can also be described by just a few lines of pseudo-code, e.g., http://people.idsia.ch/~juergen/rnn.html, http://people.idsia.ch/~juergen/compressednetworksearch.html
2. General purpose quantum computation won’t work (Juergen’s prediction of 15 years ago is still standing). Related: The universe is deterministic, and the most efficient program that computes its entire history is short and fast, which means there is little room for true randomness, which is very expensive to compute. What looks random must be pseudorandom, like the decimal expansion of Pi, which is computable by a short program. Many physicists disagree, but Einstein was right: no dice. There is no physical evidence to the contrary http://people.idsia.ch/~juergen/randomness.html. For example, Bell’s theorem does not contradict this. And any efficient search in program space for the solution to a sufficiently complex problem will create many deterministic universes like ours as a by-product. Think about this. More here http://people.idsia.ch/~juergen/computeruniverse.html and here http://www.kurzweilai.net/in-the-beginning-was-the-code
Recurrent Neural NetworksThe world of RNNs is such a big world because RNNs (the deepest of all NNs) are general computers, and because efficient computing hardware in general is becoming more and more RNN-like, as dictated by physics: lots of processors connected through many short and few long wires. It does not take a genius to predict that in the near future, both supervised learning RNNs and reinforcement learning RNNs will be greatly scaled up. Current large, supervised LSTM RNNs have on the order of a billion connections; soon that will be a trillion, at the same price. (Human brains have maybe a thousand trillion, much slower, connections – to match this economically may require another decade of hardware development or so). In the supervised learning department, many tasks in natural language processing, speech recognition, automatic video analysis and combinations of all three will perhaps soon become trivial through large RNNs (the vision part augmented by CNN front-ends). The commercially less advanced but more general reinforcement learning department will see significant progress in RNN-driven adaptive robots in partially observable environments. Perhaps much of this won’t really mean breakthroughs in the scientific sense, because many of the basic methods already exist. However, much of this will SEEM like a big thing for those who focus on applications. (It also seemed like a big thing when in 2011 our team achieved the first superhuman visual classification performance in a controlled contest, although none of the basic algorithms was younger than two decades: http://people.idsia.ch/~juergen/superhumanpatternrecognition.html)
So what will be the real big thing? I like to believe that it will be self-referential general purpose learning algorithms that improve not only some system’s performance in a given domain, but also the way they learn, and the way they learn the way they learn, etc., limited only by the fundamental limits of computability. I have been dreaming about and working on this all-encompassing stuff since my 1987 diploma thesis on this topic, but now I can see how it is starting to become a practical reality. Previous work on this is collected here: http://people.idsia.ch/~juergen/metalearner.html
Consciousness, AI and memory
Karl Popper famously said: “All life is problem solving.” No theory of consciousness is necessary to define the objectives of a general problem solver. From an AGI point of view, consciousness is at best a by-product of a general problem solving procedure.
Juergen is not a big fan of Tononi’s theory. The following may represent a simpler and more general view of consciousness. Where do the symbols and self-symbols underlying consciousness and sentience come from? Juegen thinks they come from data compression during problem solving.
While a problem solver is interacting with the world, it should store the entire raw history of actions and sensory observations including reward signals. The data is ‘holy’ as it is the only basis of all that can be known about the world. If you can store the data, do not throw it away! Brains may have enough storage capacity to store 100 years of lifetime at reasonable resolution .
As we interact with the world to achieve goals, we are constructing internal models of the world, predicting and thus partially compressing the data history we are observing. If the predictor/compressor is a biological or artificial recurrent neural network (RNN), it will automatically create feature hierarchies, lower level neurons corresponding to simple feature detectors similar to those found in human brains, higher layer neurons typically corresponding to more abstract features, but fine-grained where necessary. Like any good compressor, the RNN will learn to identify shared regularities among different already existing internal data structures, and generate prototype encodings (across neuron populations) or symbols for frequently occurring observation sub-sequences, to shrink the storage space needed for the whole (we see this in our artificial RNNs all the time). Self-symbols may be viewed as a by-product of this, since there is one thing that is involved in all actions and sensory inputs of the agent, namely, the agent itself. To efficiently encode the entire data history through predictive coding, it will profit from creating some sort of internal prototype symbol or code (e. g. a neural activity pattern) representing itself [1,2]. Whenever this representation becomes activated above a certain threshold, say, by activating the corresponding neurons through new incoming sensory inputs or an internal ‘search light’ or otherwise, the agent could be called self-aware. No need to see this as a mysterious process — it is just a natural by-product of partially compressing the observation history by efficiently encoding frequent observations.
 Schmidhuber, J. (2009a) Simple algorithmic theory of subjective beauty, novelty, surprise, interestingness, attention, curiosity, creativity, art, science, music, jokes. SICE Journal of the Society of Instrument and Control Engineers, 48 (1), pp. 21–32.
 J. Schmidhuber. Philosophers & Futurists, Catch Up! Response to The Singularity. Journal of Consciousness Studies, Volume 19, Numbers 1-2, pp. 173-182(10), 2012.
Promising students are smart and tenacious
But how to quickly recognize a promising student when you first meet her? There is no recipe, because they are all different! In fact, sometimes it takes a while to recognize someone’s brilliance. In hindsight, however, they all have something in common: successful students are not only smart but also tenacious. While trying to solve a challenging problem, they run into a dead end, and backtrack. Another dead end, another backtrack. But they don’t give up. And suddenly there is this little insight into the problem which changes everything. And suddenly they are world experts in a particular aspect of the field, and then find it easy to churn out one paper after another, and create a great PhD thesis.
After these abstract musings, some more concrete advice. In interviews with applicants, members of my lab tend to pose a few little problems, to see how the candidate approaches them.
AI beyond humans
20 years from now we’ll have 10,000 times faster computers for the same price, plus lots of additional medical data to train them. I assume that even the already existing neural network algorithms will greatly outperform human experts in most if not all domains of medical diagnosis, from melanoma detection to plaque detection in arteries, and innumerable other applications.
Even (minor extensions of) existing machine learning and neural network algorithms will achieve many important superhuman feats. I guess we are witnessing the ignition phase of the field’s explosion. But how to predict turbulent details of an explosion from within?
In 2035 computers will be more than 10,000 times faster than today, at the same price. This sounds more or less like a human brain power in a small portable device. Or the human brain power of a city in a larger computer.
Given such raw computational power, he expects huge (by today’s standards) recurrent neural networks on dedicated hardware to simultaneously perceive and analyse an immense number of multimodal data streams (speech, texts, video, many other modalities) from many sources, learning to correlate all those inputs and use the extracted information to achieve a myriad of commercial and non-commercial goals. Those RNNs will continually and quickly learn new skills on top of those they already know. This should have innumerable applications, although I am not even sure whether the word “application” still makes sense here.
This will change society in innumerable ways. What will be the cumulative effect of all those mutually interacting changes on our civilisation, which will depend on machine learning in so many ways? In 2012, I tried to illustrate how hard it is to answer such questions: A single human predicting the future of humankind is like a single neuron predicting what its brain will do.
Juergen admits that he have no idea what is going to happen. It just seems clear that everything will change.
One thing seems clear though: in the not too distant future (2050s or so, trillions of AI’s in the asteroid belt), supersmart AIs will start to colonize the solar system, and within a few million years the entire galaxy. The universe wants to make its next step towards more and more unfathomable complexity.
Juergen won’t be surprised if Moore’s Law holds for another century. If so, computers will approach the Bremermann limit of 10^51 ops/s per kg of matter in the mid 2100s (btw, all human brains together probably cannot do more 10^30 ops/s). See this previous reply. Lightspeed constraints seem to dictate that future efficient computational hardware will have to be somewhat brain-like, namely, with many compactly placed processors in 3-dimensional space, connected by many short and few long wires, to minimize total connection cost (even if the “wires” are actually light beams). Essentially a sparsely connected RNN! More on this in the survey.