John Hennessy talked about the end of Moore’s Law and the start of a new golden age.
He described the specifics of how general computer chips from Intel and others were sped up over decades. Speed came from scaling, parallelism and multiple cores.
John sees the immediate future of faster computing coming from domain-specific languages and architectures. He sees this being able to provide on the order of 63,000 times faster speeds.
Beyond that we will use nanotechnology, quantum computers and other approaches.
He was speaking at a DARPA meeting. DARPA is funding the exploration of technology and approaches beyond Moore’s law.
Reviewing 40 Years of Moore’s Law
• 40 years of stunning progress in microprocessor design
– 1.4x annual performance improvement for 40+ years ≈ 1 million times x faster (throughput)!
• Three architectural innovations:
– Width: 8->16->64 bit (~4x)
– Instruction level parallelism:
– 4-10 cycles per instruction to 4+ instructions per cycle (~10-20x)
– Multicore: one processor to 32 cores (~32x)
• Clock rate: 3 MHz to 4 GHz (through technology & architecture)
• Made possible by IC technology:
• Moore’s Law: growth in transistor count
– Dennard Scaling: power/transistor shrinks as speed & density increase
– Energy expended per computation is reducing
End of Dennard Scaling is a Crisis
• Energy consumption has become more important to users
– For mobile, IoT, and for large clouds
• Processors have reached their power limit
– Thermal dissipation is maxed out (chips turn off to avoid overheating!)
– Even with better packaging: heat and battery are limits.
• Architectural advances must increase energy efficiency
– Reduce power or improve performance for same power
• The dominant architectural techniques have reached limits in energy efficiency!
– 1982-2005: Instruction-level parallelism
– Compiler and processor find parallelism
– 2005-2017: Multicore
– Programmer identifies parallelism
– Caches: diminishing returns (small incremental improvements).
Instruction-level Parallelism Era 1982-2005
• Instruction-level parallelism achieves significant performance advantages
• Pipelining: 5 stages to 15+ stages to allow faster clock rates (energy neutralized by Dennard scaling)
• Multiple issue: less than 1 instruction/clock to 4+ instructions/clock
– Significant increase in transistors to increase issue rate
• Why did it end?
– Diminishing returns in efficiency
Getting More ILP
• Branches and memory aliasing are a major limit:
– 4 instructions/clock x 15 deep pipeline need more than 60 instructions “in flight”
• Speculation was introduced to allow this
• Speculation involves predicting program behavior
– Predict branches & predict matching memory addresses
– If prediction is accurate can proceed
– If the prediction is inaccurate, undo the work and restart
• How good must branch prediction be—very GOOD!
– 15-deep pipeline: ~4 branches 94% correct = 98.7%
– 60-instructions in flight: ~15 branches 90% = 99%
What OPPORTUNITIES Left?
– Modern scripting languages are interpreted, dynamically-typed and encourage reuse
– Efficient for programmers but not for execution
– Only path left is Domain Specific Architectures
– Just do a few tasks, but extremely well
– Domain Specific Languages & Architectures
Research Opportunity: New Technology
– Extend Dennard scaling and Moore’s Law
– New methods for efficient energy scaling
– Secure supply chains
– Overcome TDP limits for high end
– Tighter integration = more performance & less power
– Integrated 3-5s for optical interconnect.
– Carbon nanotubes?
SOURCES- DARPA, John Hennessy Talk
Written By Christina Wong. Nextbigfuture.com