"Knowm’s AHaH computing approach combines the best of machine learning and quantum computing via memristors," Chief Executive Officer Alex Nugent told EE Times in advance of the company's unveiling today. "Our neuromemristive processors use a low-level instruction set that can be combined in various ways to achieve any number of learning algorithms."
Many researchers take the “let’s decode the brain!” approach and want to simulate the brain on massive computer clusters. While this will most certainly provide insights into how the brain works, they will eventually be faced with the reality that that their simulations will not be able to compete on power, density and speed with other approaches that have addressed the issue directly.
Knowm has taken a different approach, which is to build a chip that doesn’t necessarily emulate a brain but instead provides adaptive-learning functions at a foundation ‘physical’ level and consequently beats other approaches on power and density.
Each memristor 'remembers' how much current has passed through it, and in what direction, by changing its resistance, here based on mobile metal ion conduction through the chalcogenide material.
Question - Just how efficient do you expect AHaH to be, for say, 100 trillion synapses? Our brains do it at about 20 watts, supercomputers would have it at several hundred megawatts, or even a gigawatt, what’s kT-RAM expected to be at?
There is a lot to this topic. Here is a short and simple answer. Known is using differential memristors as synapses. The energy dissipated in a synapse is the power dissipated over both memristors for duration of the read/write pulses. P = IV = V^2 / R , where V is the voltage and R the resistance. Typical on-state resistance of memristor are 500kOhm, and typical voltage is .5V, so: P=.5^2 / 500E3 = 5E-7 watts. The energy is only dissipated during read and write events, which occur in pulses of ~50ns (nanoseconds) or less. The energy per memristor per synaptic read or write event is then 5E-7W x 50E-9s = 2.5E-14 Joules. Since the kT-RAM instruction set requires paired read-write instructions, and since a synapse is two memristors, we multiple that answer by four: 1E-13. This is .1 pico-joules per adaptive synaptic event. Note we could lower the voltage and and pulse width to achieve even lower power for synaptic integration operations (i.e. no learning).
If we say a human brain has 100 Billion neurons, each with 1000 synapse, that fire on average once per second, that is 1E14 adaptive synaptic events per second. The energy consumed in one second is 20 Joules. So if we put all energy into synaptic events we get 2E-13, or twice that of kT-RAM.
The actual deployed power consumption of kT-RAM is dependent on what sort of computing architecture its embedded in. Its purpose is to remove the memory-processing bottleneck for adaptive learning operations. If you knew exactly the connection topology of a brain and made a custom ASIC of AHaH nodes you are looking at efficiencies comparable to and possibly even exceeding biology. The reason for this is that our modern methods of chip communication can be quite a bit more efficient (and faster) than biology. However, if you have a more generic architecture that enables you to explore more connection topologies, for example a mesh grid of little CPUs with local RAM and kT-RAM, you would expend more energy but get something more flexible: The ability to emulate any brain, not just one specific type.
Question - How is kT-RAM different from an FPGA?
kT-RAM is best understood as a learning co-processor. It excels at things like inference and classification. We are not aware of other computing substrates that provide access to higher power efficiency or synaptic density than kT-RAM. FPGAs are very useful generic hardware accelerators built with modern CMOS technology. Before physical production of kT-RAM, we are using FPGAs (and other hardware accelerators) as kT-RAM emulators in our development platforms.
Question - How does kT-RAM differ from Micron’s Automata Processor?
Micron’s Automata Processor is a parallel and scalable regular expression matcher with limited dynamic reconfigurability. KT-RAM is a learning co-processor. kT-RAM can do things like unsupervised feature learning, inference, classification, prediction and anomaly detection. As an example, if you wanted to search Wikipedia for all occurrences of some word or word pattern that you specify, the Automata Processor would be great. If you wanted to learn a representation of that word and how it relates to other words (its meaning), you would be better off using kT-RAM.
Knowm proposes Thermodynamic RAM
Arxiv - Thermodynamic-RAM Technology Stack
Knowm introduce a technology stack or specification describing the multiple levels of abstraction and specialization needed to implement a neuromorphic processor based on the theory of AHaH Computing. This specific implementation is called Thermodynamic-RAM (kT-RAM). Bringing us closer to brain-like neural computation, kT-RAM will provide a general purpose adaptive hardware resource to existing computing platforms enabling fast and low-power machine learning capabilities that are currently hampered by the separation of memory and processing. The motivation for defining the technology stack is two-fold. First, explaining kT-RAM is much easier if it is broken down into smaller, more manageable pieces. Secondly, groups interested in realizing kT-RAM can choose a level to contribute to that matches their interest and expertise. The levels of the Thermodynamic-RAM technology stack include the memristor, Knowm-Synapse, AHaH Node, kT-RAM, kT-RAM instruction set, sparse spike encoding, kT-RAM emulator, and SENSE Server.
Thermodynamic-RAM is the first attempt at realizing a working neuromorphic processor implementing the theory of AHaH Computing. While several alternative designs are feasible and may offer specific advantages over others, the first design aims to be a general computing substrate geared towards reconfigurable network topologies and the entire spectrum of the machine learning application space. In the following sections, we break down the entire design specification into various levels from ideal memristors to integrating the finished product into existing technology. Defining the individual levels of this ‘technology stack’ helps to introduce the technology step by step and group the necessary pieces into tasks with focused objectives. This allows for separate groups to specialize at one or more levels of the stack where their strengths and interests exist. Improvements at various levels can propagate throughout the whole technology ecosystem, from materials to markets, without any single participant having to bridge the whole stack. In a way, the technology stack is an industry specification.