About 2020 for Samsung 3 Nanometer MBCFET Chip on Roadmap

Samsung’s 3nm Gate-All-Around (GAA) process, 3GAE, development is on track. The Process Design Kit (PDK) version 0.1 for 3GAE was released in April to help customers get an early start on the design work and enable improved design competitiveness along with reduced turnaround time (TAT).

Compared to 7nm technology, Samsung’s 3GAE process is designed to provide up to a 45 percent reduction in chip area with 50 percent lower power consumption or 35 percent higher performance. The GAA-based process node is expected to be widely adopted in next-generation applications, such as mobile, network, automotive, Artificial Intelligence (AI) and IoT.

Conventional GAA based on nanowire requires a larger number of stacks due to its small effective channel width. On the other hand, Samsung’s patented version of GAA, MBCFET™ (Multi-Bridge-Channel FET), uses a nanosheet architecture, enabling greater current per stack.

While FinFET structures must modulate the number of fins in a discrete way, MBCFET™ provides greater design flexibility by controlling the nanosheet width. In addition, MBCFET™’s compatibility with FinFET processes means the two can share the same manufacturing technology and equipment, which accelerates process development and production ramp-up.

Process Technology Roadmap and Advanced Packaging Updates

Samsung’s roadmap includes four FinFET-based processes from 7nm down to 4nm that leverage extreme ultraviolet (EUV) technology as well as 3nm GAA, or MBCFET™.

In the second half of this year, Samsung is scheduled to start the mass production of 6nm process devices and complete the development of 4nm process.

The product design of Samsung’s 5nm FinFET process, which was developed in April, is expected to be completed in the second half of this year and go under mass production in the first half of 2020.

Extensions of the company’s FD-SOI (FDS) process and eMRAM together with an expanded set of state-of-the-art package solutions were also unveiled at this year’s Foundry Forum. Development of the successor to the 28FDS process, 18FDS, and eMRAM with 1Gb capacity will be finished this year.

SOURCES- Samsung, Youtube
Written By Brian Wang, Nextbigfuture.com

8 thoughts on “About 2020 for Samsung 3 Nanometer MBCFET Chip on Roadmap”

  1. Mzso, you’re sounding “weakly peeved”, old son. As counterpoint, consider…

    If “yesterday” the most optimal chip contained 1,000 million transistors, and “tomorrow” (FINFET) it might be 10,000 million, well … that’s 10× more. For nearly the same chip size, and nearly (if not lower) power consumption, ten or mote times as many gates, and whatever aggregate processing improvements concomittant therein. Not bad, not bad at all.

    While you (and I and everyone else) might dearly wish for a Great Big Breakthrough in what-it-takes-at-the-tiniest-level to compute, well … here’s a sad / bad / mad dose of reality. With only 3 or 4 known alternative (possible-breakthrough) technologies on the horizon, we have to ask, “What does it mean to ‘Compute’?”

    And that is a serious point/question.

    There as gulf as wide as the Indian Ocean between conventional 1s and 0s NAND gate oriented digital computing and that which goes for solution-finding to mixed-eigenstate quantum computing. I don’t foresee room (or elevated) temperature quantum computing though, ever: the elevated information-noise floor is so high, it would be like recording a symphony in a hall that has 10 crews of jackhammers and heavy machinery resurfacing the floors. I.e. no symphony. 

    Anyway, be a bit less pessimistic. The 10× or better IS an improvement, and it IS real, and IS valuable. 

    Just saying,
    GoatGuy ✓

  2. having yet-smaller FET transistors doing the heavy lifting, by the hundreds-of-billions-per-chip. Woohoo!

    More like boohoo. 🙁

    It just means that these ancient sixties technology will be around even longer, with hardly measurable, mostly marketing improvements.

    I’d much prefer some actual revolution with 10-100 folds of improvement, that some approaches promise.

    We don’t even have proper SOCs yet with proper integrated CPU/GPU with proper high speed (repacing caches) several gigabyte shared integrated memory.

    To me CPU technology mostly stopped evolving. You get slight tweaks and a lot of marketing hype and that’s it.

    To 64-cores on one chip?

    I’d rather have 1-4 core chips that are 64-16 times faster

  3. The 4 diode illustration is NBF’s comment system attempting to provide an illustration of the linked essay on a cross-bar memory’s mutex circuit to replace cache coherence with shared on-chip memory. Going with your 60-core benchmark:

    60core*800transistor/core=4.8e7 transistor

    That’s about 1/400th of the real estate of a 20e9 transistor chip. You’ll agree that is negligible. However, you’ll be more than a little skeptical about the absence of cache/coherence afforded by some so-called “cross-bar memory’s mutex”. “They also laughed at bozo the clown.”, etc.

    Well, maybe the mutex circuit diagram wouldn’t work as simulation seems to indicate, but if it did, a rough cut on the layout of the two standard cells (core and mutex memory module MMM) on a mere 20e9 transistor chip:

    60 cores occupy a narrow (340 transistor wide) COLUMN down the midline of the 140,000×140,000 transistor chip (to minimize average path length latency). They share, say, 600 banks of on-chip memory. To match the 60 column geometry, each memory bank would consist of a COLUMN of 60 MMM standard cells containing 235×2357 = 550,000 transistors.

    I don’t know how many transistors per chip these “3nm design rules” permit, but adjust the math accordingly and it becomes pretty apparent that at some point it makes sense to increase the number of cores just so that their transistor width doesn’t get ridiculously small (as though 340 wide isn’t already ridiculously small).

  4. Wow…

    What on earth do 4 diodes have to do with anything? Nothing.

    I don’t quite follow the RAM-quantity-versus-CORE-count optimization. For those gamely reading on:

    Chip area has competing design uses. (“Duh, of course!”). Early on it was recognized that processors were getting fast enough that access to RAM was taking the lion’s share of elapsed time, with the CPU just idling for data.

    Thus “cache memory”. An array of fast memory between “the core” and “DRAM”, of many kilo-to-mega-bytes, improves the odds of having frequently needed values (and to-be-written queues).

    The problem in multicore, was that MY cache if not synchronized with “your” caches, could have DIFFERENT data, different queued up writes and so forth. A terrible situation.

    So the idea was, “unify the caches” and come up also with a parallel-but-different-path way to ensure that the multitude of core-caches at least know when their copy of data is potentially “dirty”. 

    MuTex, as one example. 

    Thing tho’ is, that the multiway synchronization between cores multiplies as both by cache size, and core count. So… at some level, it becomes the bottleneck. 

    I think that is what Bowery was talking about.
    I think.

    Just musing,
    GoatGuy ✓

    PS: yes, the authors of high-tech articles (very often inside the companies making the blurbs!!!) are almost to a head atrocious writers.

  5. You have to pay attention to the bit-cell area for memory when it comes to multicore chips because that tells you how often you have to go off-chip and suffer the moral equivalence of a page fault to disk from VM. There is a density where it makes sense to try to keep shared memory on chip and cut to cut out the exponential kludge of keeping SRAM cache valid on a per-CPU basis. The transistor count per core goes _way_ down and you can distribute the mutex circuitry to the on-chip memory banks to go async and DRAM.

    http://jimbowery.blogspot.com/2013/04/a-circuit-minimizing-multicore-shared.html

    At enough on-chip memory, and _much_ smaller cores, you’ve got a huge gain.

    Getting good numbers about these densities is hard and made ridiculously hard by “editors” of stories, such as this one by EETimes

    https://www.compart.com/en/unicode/U+00B2

    A quote: “7nm process to hit a bit-cell area of 0.027mm²”. Yes, I know… it’s “obvious” what they meant. Still, you have to consider the quality of thinking going into the press on this stuff.

    But, ok, so let’s say they got 0.027um² at “7nm”. And let’s say your chip is 2.7cm on a side (same area as 32-core AMD Epyc). That’s 1GB SRAM, right? The cores are tiny dumb little 32-bit guys that have 800,000 transistors as did the Cray-1 CPU (200,000 NAND gates -> 64 bit including its vector registers).

    Now, let’s say that the b*******s are lying as usual and that you can only _really_ get that density at their “3nm”…

  6. Seems like the next-next-next generation of chips will continue apace, having yet-smaller FET transistors doing the heavy lifting, by the hundreds-of-billions-per-chip. Woohoo!

    It is somewhat disingenuous to call any of these things “3 nanometer” (or 5 or 7 for TSMC’s offerings). None of the processes are able to actually make 3, 5 or 7 nanometer device-to-device interconnects, or even overall device scaling rules.  Barely below 50 nm, for the most part.  

    However, the “business end” of a gate IS critically dependent on the width of the current amplification channel in each of its FETs.  Hence, making that smaller, makes it faster and for a given speed also less power consuming.

    Onward we go…

    To 64-cores on one chip?  
    Seems likely.  
    In fact, it seems “around the corner” technologically.  

    After all, in the upcoming Computex, AMD will be both showing and demoing its 64 core ThreadRipper device, built upon a mega-sized package bearing 9 chips.  8 of them all-the-same, 8 core processors. The 9th is the “comm fabric” chip at the center unifying all the others to fully utilize both DRAM bandwidth and I/O capacity. Especially with the M.2 RAID array setup, easily able to deliver 8+ gigabytes/sec I/O these days.  

    Just saying,
    GoatGuy ✓

Comments are closed.