Computer Chips Will Improve 1000 Times Within 5 Years

Intel’s Raja Koduri gave the Monday keynote talk at the Hotchips 32 conference today. He argued in the keynote that within the next 5 years the industry would be able to increase performance by 1000 times.

Sander Olson attended this presentation.

Raja describes a path to 50x Transistor Density.

Pitch scaling will triple the density of Finfets.

Nanowires will provide another doubling over Pitch scaling.

Stacked nanowires will double the density over nanowires

Stacked wafers will provide another doubling.

Die wafer stacking provides another doubling.

Memory and Packaging Breakthroughs

SOURCES- Hot Chips, Intel, Sander Olson
Written By Brian Wang, Nextbigfuture.com

48 thoughts on “Computer Chips Will Improve 1000 Times Within 5 Years”

  1. Same for China. God bless liberty, freedom and capitalism; it is the way forward.

    You mean plundering, mooching, and parasitism. That's what's funded muricas dominance. Taking away people and knowledge from others.

  2. Governments funding oligarch corporations…
    States should never allow income and rights to just flow into private corporations. It just results in them bloating, gobbling up money and popping out billionaires.

  3. Ok, So:
    Physical simulation: parallelizes well.
    Draw calls: IO limited not CPU.
    GPU, GPU,GPU,GPU
    So we can conclude that single core clockspeed is not super relevant for gaming, memory speed and GPU are.

    However I already know how game engine core loops work. I’m a coder. I was more interested in the statistics of how many commonly used programs would be affected. And how badly.
    There’s been a recent drive toward parallelizing ones algorithms when possible, but some things are just inherently serial, and no clever coding can fix it.

    So to rephrase the question: Are the inherently serial parts of (CPU bound) modern programs a bottleneck, or do they in most cases constitute a small fraction of the runtime.

  4. I am all for our free and mostly Liberal Market economy but most vocal proponents forget the valuable role of the state for innovation. Most fundamemental research especially during recessions is funded with public money. The basis for the silicon chip goes way back but most concretely to the seventies in silicon Valley. Lots of public money was invested in stimulation programs and in public-private research (University with corporations). Notably off course DoD, Darpa, etc but legion other state or national entities invested. Public money is a very important part of innovation. Regrettably often overlooked and the eventual revenue off course mostly goes into private pockets.

  5. You basically never want a phase change at the heat transfer surface: As soon as a bubble forms, the heat transfer rate drops dramatically, because gas doesn’t conduct heat as well. And now you’ve got the Ledenfrost effect making sure the coolant doesn’t touch the chip.

    *Maybe* you could get away with running a coolant through holes in the chip normal to the surface, and transitioning from liquid to gas partway through. As long as you could keep the transition zone from propagating back through the channels.

  6. I think once you make the transition from an engineering centered company, to a financial management centered company, (See Boeing for an example of this.) reversing that is REALLY difficult. Because the management are in charge, and they’ll always think management is the most important thing.

  7. Pretty much what I was saying: For flash memory, the sky is the limit for stacking, because your power dissipation is limited to the bandwidth of the chip: Stacking doesn’t increase power dissipated, just memory capacity.

    For processors, stacking increases dissipation, and it’s already close to hard limits: We’ll only be able to stack to the extent we can reduce the power consumed by each layer, or radically improve heat dissipation.

    Now, I could imagine using the added gates to switch to a new architecture, where most of the chip was powered down at any given time, and you keep moving around the active area: You’d still get the speed implications of stacking in terms of shorter traces, but each unit area of the chip would be duty cycle limited by heat dissipation constraints. Perhaps that could work with some sort of data-flow design.

    But, to be honest, I stopped following processor design in detail back around the death of the Transputer; I may have studied computer engineering in college, but it’s not the career I ended up in. So I just follow it a little these days.

  8. I’d argue it’s more of a U shaped curve: They let things liberalize a bit for a while, then came Xi, and all the wealth that had come their way got devoted to creating humanity’s first modern, computerized totalitarian society, with the sort of panopticon surveillance that 1984 only had nightmares about.

    Was there genuinely reason to think they’d keep liberalizing, rather than reverse it once they’d reaped some benefits? I’d argue not: Even in the ‘free’ societies of the West, once the economies became large enough for the elites to be comfortable, they tightened the screws. The regulatory state replaced the free market, and progress stagnated in all areas that were regulated. I mean, compare “experimental” kit planes, that you can only build and fly with a liability waiver, to the civil aviation planes you can just buy off the shelf. No comparison.

    That’s why the internet became such a big thing: For a while it was growing under the radar. All the real progress of the latter 20th century took place in the parts of the economy that hadn’t yet been comprehensively regulated.

    It’s a universal trend, I’d argue: Once the economy becomes big enough that the elites feel comfortable, they see further economic growth as a positional threat, and take away freedom to the extent they’re able. China was always going to follow that model, too, only starting from much less free, and with no limits on what the government could do.

  9. Extremely important for any user-facing low latency application and quite irrelevant for truly independent processes running in parallel.

    Parallel code that speeds up a single task relies on some kind of distribute and gather step and this gives a serial work load. Even a small serial fraction limits the core scaling very strongly (see Amdahl’s law). If you just have a 10% serial fraction, you only get a 7x speedup from 16 cores and doubling that to 32 cores gives you 8x speed up. With infinity cores you’re limited to 10x speed up.

    When you loop over multiple serial tasks that have to happen in series, you can unroll the loop and work on multiple iterations in parallel. If you don’t care about latency this is OK.

    Consider unrolling the gameplay loop in a game; each frame might grossly simplified look like this:

    Take player input and Update world and NPC state (path finding, AI, scripted events etc.)
    Physical simulation
    Batching and draw calls

    In the past, this was done on one CPU core in one frame, serially. But if you unroll this loop it looks like this:

    Take player input and update world and NPC state for frame N
    Physical simulation for frame N-1
    Batching and draw calls for frame N-2
    GPU driver frame N-3 (even the driver can add a frame of latency if you let it)
    GPU actually rendering frame N-4
    Frame N-5 waiting to be displayed (if vsync)
    Frame N-6 currently being drawn to the screen

    This is why 144 FPS feels like 60 FPS used to feel.

  10. China has already overtook the USA in a number of sectors, from high speed rail to vehicle production, from 5G to infrastructure.
    They will over take the USA as no 1 chip manufacturing powerhouse in the next five years and there is simply not hing the USA can do to prevent it
    If they invade Taiwan, they can become no 1 as soon as next year.
    No, the USA will not start a nuclear war with China to protect Taiwan

    Just saying

    Lukka

  11. Full size chips are prohibitively expensive. Even a stupid simple task, like taking a 2080ti and die shrinking it or implementing it on a different process node costs hundreds of millions dollars. It’s all the masks, validation and everything that costs so much (not even counting R&D costs). EUV has possibly pushed this exponential ramp of development costs down the road a couple of years, but we will catch back up.

    For this reason alone it makes sense to make one type of chiplet and reusing it as many times as humanly possible.

    If you only sell a million chips at a high end node, you have to slap and extra couple of hundred dollars onto the price for development cost. If every GPU or CPU you make are stapled toghether of the same few kinds of chiplets and you staple toghether as many as you need; that might not be optimal in terms of performance, efficiency etc. but it will be done anyway for yield and cost reasons.

  12. I’d argue that 2020 China is still a LOT more liberal than 1978 China. It’s just that the illiberal parts now have a lot more international reach.
    Wealth through trade made them more liberal and more powerful. It was a risky tradeoff.

  13. And they are allready doing that. Sort of. [1] Of course, you want cheap electricity, stable political conditions and you don’t want the build to be expensive. Any mountaintop would increase the build price of the server farm, so that is out of the question. Also, high altitude makes thin air and in turn expensive cooling (I assume)…

    (1)
    https://www.dailymail.co.uk/sciencetech/article-3814105/That-s-really-cool-Facebook-gives-rare-glimpse-inside-gigantic-Lule-server-farm-just-70-miles-Artic-circle-Sweden.html

  14. You know, with a good internet connection, if the primary cost of the computation was the power consumption and cooling, you could put server farms in high latitudes North and South, and use the waste heat for building heat.

  15. Yes, you could always – even with wafer stacking – make sure that you had the option for liquid cooling and high power levels. But this would be a failure, in my opinion. What you are looking for is:

    (1) lots of performance per USD of investment 
    (2) lots of performance per W.

    If you allow a chip to draw several hundred watts you are failing in the second metric and the result would be expensive. Just look at the computer above.. 28 MW. Every hour would cost at least a million dollars to run. It’s not like a normal scientist can afford it…just some megaprojects…

  16. “Computer Chips Will Improve 1000 Times Within 5 Years”
     
    If that is true, and we are talking about production chips at an affordable price, then it would be time to grab hold of our socks. “Futureshock” would be a woefully inadequate term for what would be happening as fast as we could implement this level of improvement into both new and existing systems.
     
    Just for scaling, the human brain operates at about 1,000 petaflops or 1 exaflop. The fastest computers in the world are passing this level. (That does not mean such machines will ever be practical for hosting AI, that probably requires a different sort of chip altogether, see memristors and neuristors.)

    A thousand times faster would be zettascale and, doubling every 18 months, would take about 15 years to reach. Getting there in 5 would take some getting used to, kinda like getting hit by a firehose might take some getting used to.
      
    Digressing, there are theoretical limits to processing power and information storage. Even with something like Moore’s Law (such as a “law” that doesn’t tie itself to transistors but to general performance) we are still centuries away from having to worry about it . . . or are we?
     
    There is some thought that civilizations accustomed to a constant doubling of performance every 18 months for many centuries could become so dependent on it that, when they hit the limit, they collapse.

  17. Name the planning committee that came up with the x86 process? Which government official decided that RISC was not the way forward but CISC was better? Just name one. Do you want to own the chip maker or what goes on with the chip? It’s like saying CD manufacturers are the future. The music and software that went on the CD is what mattered, not the CD.

  18. Why did people care about Moore’s law? Dennard scaling.
    Dennard scaling ended in 2006. That’s what people actually cared about and why a desktop PC from 2010 is not notably weaker than one from 2020. When you get to graphics it makes a bigger difference, or mobile, but nothing like the good old days. In the 90’s we had on average 60% performance improvement per year. Today it’s more like 5-10%.

  19. I’m personally expecting a move away from large silicon, (Or whatever.) towards producing small optimized chiplets that get assembled bare at high density. Kind of like a chip level equivalent of our current board and mother board architecture. Yields would probably be better that way.

    If they were assembled perpendicular to the mother chip, and that had holes to direct coolant properly, this sort of arrangement could sustain very high power densities.

  20. Well, at least for flash memory, it seems that you do save a lot of money when going to more layers. I think they are now using 128 layers and this is vastly cheaper than making 128 separate chips. So perhaps you would have similar cost gains when using stacking for processors as well? Of course, in the case of flash chips, you don’t actually start with 128 wafers that you thin to 30 microns, but you rather add plasma deposition, lithography and etching steps, so there is a significant difference.

    Of course, the biggest thing here is the information transfer per watt. If this could be reduced from a few pJ per bit (RAM to processor) to 0.15 pJ per bit or even 0.05 pJ per bit this will result in drastically lower power consumption when performing ANN training and inference. Today, it’s the power consumption that limits the number of transistors – and performance – that can be used effectively, not the lithographic density. So you would end up with superior performance per W and lower server running cost. Both are important.

  21. Yes, that’s proven to be the problem with free market capitalism. It’s a high trust solution: It works great as long as everybody is practicing it, but defectors can reap extra benefits if they’re not stopped, reducing the global benefits, but diverting an outsized proportion of the remaining benefits their own way.

    It was widely claimed that becoming more wealthy would liberalize China, so they were permitted to take part in international free trade despite routinely violating its rules. But it didn’t liberalize them, it just made their totalitarian state better funded.

    Were there ever good reasons to think it would liberalize them, or did they just bribe some politicians into claiming to believe that? I guess we’ll never know, unless they fall, and there’s something similar to the look we got at the KGB archives.

    Anyway, everybody is now aware that theory is a crock, and China has lately become openly obnoxious enough that even the politicians in their pay are finding it hard to justify letting the situation go on. They’re really going to regret making sure that the rest of the world got Covid 19, too, and in the same year they crushed Hong Kong.

  22. Interesting. To be able to perform the wafer stack and reach small energy cost of bit transfer, the wafers have to be thinned. Samsung can reach 30 um today, but I guess that this would not be enough to reach 0.15 pJ (or 0.05 pJ for that matter) per bit. Perhaps 10 um? Or 5 um? Otherwise, the wires become to long and you have capacitance that causes energy expenditure. 

    So imagine handling a 300 mm diameter wafer that is only 10 um, thinner than a human hair! Makes you wonder if it would be possible to make the wafers that thin from the start to remove the step where almost all of the wafer is removed in the thinning process. An awful lot of the material that is wasted otherwise…

    Given that such wafers would be more like a plastic film, would it be possible to make the wafer process into a roll-to-roll process? Of course, this would require a lot of new technology; spray coating of resists, adaptive focusing while exposing, new mechanical fixation just to mention a few.. But the reward would most likely be much better performance per dollar…

  23. A huge proportion of video talks on-line are super slow and dull to watch.
    It turns out that the professional actors, speakers and news readers did actually have skills and abilities that the average person lacks. Who knew?
    Anyway, the usual trick is to play the video at higher speed. Most systems will let you play a video or audio file at 1.25x or 1.5x normal speed. In extreme cases 1.75x or even 2x is necessary.
    Sometimes you encounter someone who is not just slow, but also unclear. WIth either a horrible accent or mumbling or something. Then you can’t listen at normal speed, and can’t understand at high speed, at which point you need to hope the subtitle function is working.

  24. How important is single threaded performance these day? Can’t we just do away with all that fancy BPU, / pipeline management, Use the extra idle-time to implement more virtual cores, and shove in a couple extra physical cores with the transistor/energy-dissipation budget we freed up by simplifying the architecture. Sure single threaded perf would TANK, but ops/watt & ops/cm^2 would increase. Right?

  25. If intel’s roadmap is so great in the next 5 years then why is Apple dumping intel cpus in their notebooks… the last 2 times Apple dumped their cpu was because The cpu was going extinct,,, Morotola 68000…rip….. PowerPC ….rip…. intel x86? Don’t you need a giant heat sink to get performance out of an x86? Otherwise it’s deceleron……..

  26. I tried to watch the keynote…. sorry… I must have fell asleep in boredom… this guy talks in the most monotone and boring voice possible…. he must be an exo-scale AI robot from intels lab…because it’s worse then listening to data from Star Trek…

  27. Yeah, that’s real gate dense stuff, but where does the heat go? I guess you could use a dialectric solvent’ like propane, and hold the pressure at a point where the phase change occurs just below the desired temperature of the wafers, dies, or packaging you are using.
    The rows, and planes of circuitry will need to allow bubbles of gas to rise immediately, and exit the volume of circuitry. so liquid phase solvent will move into the volume. By using gravity to sort the gaseous, from the liquid, perhaps using some sort of bubbling agent, and making sure plenty of liquid can enter from below there will be a film of liquid wetting the circuitry constantly renewed, and mixed by countless bubble edges on the circuitry.
    Perhaps a better solution is spintronic logic circuits. There is no theory that energy must be expended to transmit information.
    What might really change things is a qua-processor(TM) expansion board for desktops. There might be some civilization changing “killer apps”. Simulation of natural processes, from growing crystals, to climate change on a much finer scale, Lifelike virtual reality, at least for audio, and video, AI virtual personalities that would pass Turing tests, except “in person”. automation advances, maybe even simple repairs reliably.

  28. Easy to say, “Do all this, and the chips can improve a thousand times.”

    Can you do all those things in the same chip, and still get the full advantage? Stacking works well for usually passive memory such as you find in high capacity USB memory sticks, because at any given time few of the elements are dissipating energy. It will have serious limitations for chips where the elements are active a lot of the time.

    Anyway, I’m less concerned with MFLOPS per square centimeter per second, than I am MFLOPS per cent.

  29. I would say the chip manufacturing industry is the perfect example of where state long term planning and financing beats free market capitalism. Taiwan and Korea lead in chip manufacturing technology due to a decades long strategic investment from government. Trump had to impose sanctions now banning access to US FAB equipment to ensure China does not overtake the USA.

  30. Today’s Windows? Like a light bulb.

    The Windows when this stuff is rolled out? About the same as it always has.

  31. The problem with this hype is that over the last 5 years Intel has done squat. Let’s say I am dubious.

  32. Glad to see they are not trumpeting the death of Moore’s law. Wasn’t all of this transistor density doubling thing supposed to end about 20-30 years ago? The chip industry is the perfect example of what freedom and free market capitalism can produce. The Soviet Union could only steal, not develop. Same for China. God bless liberty, freedom and capitalism; it is the way forward.

  33. They already announced something radical with MESO transistors (Magnetoelectric Spin Orbit).

    Whether they meant it, or it was just a distraction during their pre Ice Lake flailing who knows.

    Certainly spintronics does seem an intense focus for post CMOS logic.

  34. Ooh. ooh. ooh. Application. Application. Do this. Do this:
    ECONOMIST AI: The moonshot goal of this project is to build an
    AI/supercomputer-equiv reinforcement learning framework that will recommend
    economic policies that drive social outcomes in the real world, such as
    improving sustainability, productivity, and equality.

    The AI Economist is a powerful optimization framework
    that can objectively automate policy design and evaluation. This will allow
    economists and policy experts to focus on the end goal of improving social
    welfare.

    The key ingredients are:
    A high-fidelity simulation to be grounded in data, and
    aligned with economic theory as well as with social and ethical values.
    Simulations should not be prohibitively expensive to run, and should be
    maintainable and modular.
    AI policy models should be effective in a wide range
    of scenarios, explainable, and robust to economic shocks.
    The simulation and policy models should be calibrated
    against real-world data and, as much as possible, validated in human-subject
    studies.
    https://blog.einstein.ai/the-ai-economist-moonshot/?utm_source=morning_brew

  35. Maybe it’s the opposite: Now they’ve seen their lead evaporate with the past few fiascos, they might try to do something radical to regain their lead. It likely won’t work, but it will definitely not work if they don’t even try.

  36. It takes 20 years from research to putting something on a commercial chip. Even for basic stuff like low-K or copper instead of aluminium wiring. Longer if it is something actually difficult like EUV. If they haven’t been researching all these technologies intently since 2005, we won’t get any of it this decade. Many of these are independent and complementary and no company will take the risk of implementing all of them at once. This means slow-rolling it with a doubling every 5 years and working out the yield issues in each 5 year period; that sort of thing. I think silicon CMOS will have been replaced before these technologies are fully implemented.

  37. Interesting that the person from Intel states that the industry standard will be X times more powerful, and not necessarily Intel. Maybe he’ll just be happy if Intel has got to 7nm by then.

Comments are closed.