back to article Moore's Law isn't dead, chip boffin declares – we need it to keep chugging along for the sake of AI

The machine-learning world is obsessed with training AI models as quickly as possible and if the hardware community is to keep up, future chips will need to have more memory. That's according to Phillip Wong, veep of corporate research at TSMC, one of the world's largest chip manufacturers, who began his keynote at the Hot …

  1. Mage Silver badge
    Coffee/keyboard

    Oh dear.

    Where to start on this buzz word bingo?

    Moore's law was an aspiration and it's not been plausible for 10 years or more. It's also been redefined downwards, so means whatever a chip maker wants it to mean.

    Well, yes above 2 to 4 cores the I/O bottle neck to RAM becomes more serious. I've wondered if a RISC two core on a BIG chip with lots of RAM and a piggy back RAM (SC6400 family) on top with then multiple high speed serial I/O to an array of identical chips rather than 16 to 64 cores on one chip (like Transputer idea) is better.

    I still think AI & ML is mostly PR and dubious for anything other than pattern matching. That's the underlying mechanism. It's a niche.

    Is this TSMC PR?

    1. A Non e-mouse Silver badge

      Re: Oh dear.

      Moore's law was an aspiration

      No, it was an observation. People then subsequently took it and turned it into an aspiration.

    2. Muscleguy Silver badge

      Re: Oh dear.

      Not to mention too much of it is a black box and we have learn a lot more about the biases in datasets and what to do with them.

      In the first case you get the medical AI which determined the severity of the diagnosis based on what scanner had been used. Which meant it was useless as a pre scanner diagnosis too.

      In the second case you get face recognition systems which see black faces as criminal because of the bias in the mugshots because of system bias in arrest and charge of minorities.

      There are doubtless biases in the data used we have no idea of at the moment because of the black box nature of the systems.

      When you’re evolving the walking control software for bipedal robots then that sort of thing is fine. When it comes to supposedly expert systems impacting people’s lives, not so much, not even close.

    3. Anonymous Coward
      Anonymous Coward

      Re: Oh dear.

      Where to start on this buzz word bingo?

      The original Moore's Law has held fairly well so far. It's that expectations changed.

      We're still doubling the number of transistors about every generation. It's just a decade or so ago that we got out of the voltage scaling region that used to give us a performance boost, and these days we're giving back performance due to parasitics. For example, each FET contact is over 50 Ohms these days due to the material processing requirements and small size - tungsten alloys are strong enough to handle the mechanical polishing, but they suck for resistance, especially as the physical size of the contact reduces. TSMC's 16nm was a decent boost over their 20nm, 7nm was a slight improvement, and 5nm is worse than 7nm and slightly better than 16nm. And it's been fun to observe that TSMC never has delivered us a "fast" process corner in their 7nm process, despite us being in very high volume production for a while now.

      What most folks don't appreciate is the economics of the situation. The fixed NRE costs for any design have skyrocketed with each generation, and the cost per FET has been going up rather than down. All this has been driving chip design into full SoC mode, meaning that only very big companies can afford these nodes, and that only designs that require a great deal of integration can take advantage of them. And the sheer cost of the design effort causes the fewer companies that can afford these chips to be far more conservative in what they put out because a market failure is so bloody expensive.

      So is this TSMC PR? Yes. But AI and ML are some of the relatively few applications that can justify these nodes. They're massively parallel machines that need an incredible memory bandwidth and exceptionally speedy compute unit bandwidth and decent compute unit performance, and the performance hit you take from having to go off chip in these designs is severe. They're willing to use a poorer FET if they can put more of them on a die and not pay the incredible penalty of having to go off chip.

      On a personal note, my Ph.D. 25+ years ago involved these neural nets implemented on chip, but at the time the CMOS processes were just too primitive to practically support this idea. Now that things have changed so much, it's been interesting to see just how much more friendly the sheer scale of integration has made the implementation of these ideas, and just how many more fields these things can be used in.

      1. bombastic bob Silver badge
        Devil

        Re: Oh dear.

        The reason there's no PERCEIVED Moore's Law improvements is that the software really isn't taking advantage of the hardware.

        With the exception of virtualization, a 6 core (12 hyperthread) Ryzen processor idles 11 'cores' nearly all of the time. All of those transistors NOT being used.

        Occasionally you'll see something that uses them, maybe a very special-written game or an application where its author(s) know something about symmetric multi-processing and multithread algorithms.

        "my Ph.D. 25+ years ago involved these neural nets implemented on chip,"

        Yes, this puts yuo in a unique position to see why the article is relevant, that's for sure. But seriously, when will this translate into the user perception of "faster" ?

        With the exception of natural language speech, visual scanning and object recognition, and other things that a ROBOT would need, most people aren't seeing improvements.

        So to most of the world, Moore's Law is dead, but only because of perception.

        And the biggest reason for that is SOFTWARE, not hardware. Because, after all, hardware has gotten 'wider', and not faster LINEARLY. And WAY too many people that call themselves "engineers" still insist on thinking in a straight line. Well maybe that's just PROJECT MANAGEMENT doing that, engineers are creative and of course think non-linearly, but you have to be able to turn that non-linear processing nto a program... and I haven't seen a lot of evidence of that happening effectively enough to give the user the perception of "faster".

        (clogging everything with bloat and feature creep and changing the UI into 2D FLAT hasn't helped at all but it makes SOME engineers *FEEL* like they "did something" to "improve it")

      2. Yet Another Anonymous coward Silver badge

        Re: Oh dear.

        >What most folks don't appreciate is the economics of the situation.

        Which is what Moore was originally talking about.

        The most cost effective number of transistors/area scales as a power law - because although each smaller generation was more expensive, the cost increase was mostly linear but the number of transistors was area.

        This hasn't necessarily been true for the last generation of process steps, 7nm may always be more expensive/transistor than 12nm - but if you want to pack more performance into a smaller package to fit in a phone or put a gazziillion CUDA cores on a GPU, you will pay for it.

      3. Draco
        Windows

        Re: Oh dear.

        One of the problems is the nm nomenclature - 7nm isn't really 7nm. My understading is 28nm was the last 'true' size. The following sizes - 20nm, 14nm, 7nm, etc - where more 'generation' names than accurate descriptions of transistor size.

        1. theblackhand Silver badge

          Re: Oh dear.

          "One of the problems is the nm nomenclature - 7nm isn't really 7nm. My understading is 28nm was the last 'true' size."

          It's worse than that...

          The process node used to reflect the minimum gate length, but this stopped at 45nm. As everyone's processes below this node size have been heavily customised for their own workflows/design guides, the nodes are now driven by Moores law - a 50% increase per mm2 is a half node and a 100% increase per mm2 is a full step.

          But realistically, Moores law died around the time frequency scaling stopped being the predominant factor in CPU upgrades.

          Ref:

          Overview

          https://en.wikichip.org/wiki/technology_node

          Divergence at 14nm (where known) and 16nm for TSMC:

          https://en.wikichip.org/wiki/14_nm_lithography_process

          https://en.wikichip.org/wiki/16_nm_lithography_process

    4. Michael Wojcik Silver badge

      Re: Oh dear.

      What, no complaint about "processing power per transistor"?

      Not only is that in no way a formulation of Moore's "Law" (which was originally stated in terms of transistors per chip, though transistors per unit area is more meaningful), it doesn't even make sense. I'm going to assume it was an honest mistake, though.

  2. mj.jam

    How much memory are they planning?

    "In an ideal situation, the size of memory on a chip will be larger than the training dataset"

    Training datasets can be multiple GB. So they are suggesting 1000x the amount of L1 cache compared to current chips?

    1. Pascal Monett Silver badge
      Trollface

      Hey, I will welcome our 32GB SRAM CPU overlords as soon as they deign to show up.

    2. bombastic bob Silver badge
      Meh

      Re: How much memory are they planning?

      there is likely to be a cost benefit 'maximum point' on L1 and maybe L2 and L3 cache. going 1000 times the L1 cache size we typically use NOW might get you bragging rights, but I question how much speedup it gets you as compared to now, or 10x, or 100x the size.

      We're almost "there" with respect to storage in (essentially) non-volatile RAM with SSDs. That makes an observable difference in speed. But if you spend only 1% of the processing time waiting for RAM bus cycles because the cache is 'empty', how much of a difference would it make to eliminate that 1% ? Observably by an end-user, not a whole lot. The price tag, however, WOULD make a difference.

  3. AceRimmer1980
    Terminator

    Old Macdonald had a neural processor farm, AI AI O

    You ain't gonna do it with Von Neumann architecture.

    Needs a rethink, a radical shift in paradigm/architecture, maybe with more emphasis on the connections rather than the processing elements themselves. All decoupled clocks.

    Well that's my $0.02.

    1. Muscleguy Silver badge

      Re: Old Macdonald had a neural processor farm, AI AI O

      Indeed, take more notice of analog massively parallel systems which is what in computing terms biological brains are. The vast majority of our nervous systems are engaged in keeping our bodies running properly, moving properly etc. A computer though could use the truly vast majority of its power computing instead.

    2. Trollslayer Silver badge
      Trollface

      Re: Old Macdonald had a neural processor farm, AI AI O

      It makes cents.

  4. Trollslayer Silver badge

    A 7nm line

    Is about twenty atoms wide and finfets aren't always the solution.Moore's Law is really Moore's Guideline.

  5. AnoniMouse

    There are real limits to silicon technologies

    It's all very well to talk of muliple layers (for memory chips) but there are physical limits:

    1. Transistors cannot be made much smaller (only a few electrons per gate);

    2. Larger chips are more likely to have defects so there is an incentive to minimise the chip's area;

    3. So there is a limit to the two-dimensional organisation and sizing of a chip;

    4. Use of the third dimension (multiple layers) depends on what those layers are used for: if it's mostly single access memory then only a tiny proportion is active and scaling is feasible; if (highly) parallel processing (or memory access) is the aim then the need for heat dissipation is a severe limitation to the number of active elements per unit volume.

    Interesting that wetware, which does operate in 3 dimensions, has inbuilt cooling thanks to cardiovascular circulation.

    1. Muscleguy Silver badge

      Re: There are real limits to silicon technologies

      But we have to be careful about heatstroke so the cooling system has limits. I’m a runner and I’ve been there. Auckland, NZ woke up one Sunday morning, overcast, coolish and dead still. So I did what I’d long wanted to and ran 25miles out to the other end and back all along the Waitakere ranges. The full length of Scenic Drive for those in the know.

      Two problems though, firstly the sun burnt off the clouds, secondly there was NO water. I knew where to get water in the second half of Scenic Drive which I ran often but it meant I got water too little too late.

      I had to sit in the shade with a wet towel over my head for a couple of hours afterwards. Wasn’t fun, don’t recommend.

      Not an issue generally here in Dundee and these days I can carry a drink with me in my Camelbak waist pack. Active cooling hats are a thing and technical clothing keeps you cooler in the heat than cotton or nylon back in the day (or just skin up top often). I don’t run topless any more, it’s cooler to wear a technical top. My shorts are technical, I have technical socks on. It makes a difference.

    2. bombastic bob Silver badge
      Unhappy

      Re: There are real limits to silicon technologies

      the bigger problem is heat dissipation. a single layer flat piece of silicon will reject heat better than a thicker multi-layer one...

  6. Torben Mogensen

    Hot Chips, indeed

    The major problem with cramming more transistors into chips is that if they all operate at the same time (which is a requirement for more performance), they will generate a lot of heat and consume a lot of power. The generated heat requires more cooling, which increases the power usage even more. There is a reason that your graphics card has a larger heat sink than your CPU, and the article talks about many more processing elements than on a GPU (albeit a bit simpler).

    So rather than focusing on speed, the focus should be on power usage: Less power implies less heat implies less cooling. One option is to move away from silicon to other materials (superconductors, for example), but another is to use reversible gates: They can, in theory at least, use much less power than traditional irreversible gates such as AND and OR, and you can build fully capable processors using reversible gates. But even that requires a different technology: Small-feature CMOS uses more power in the wires than in the gates, so reducing the power of the gates does not help a lot. Maybe the solution is to go back to larger features (at the end of the Dennard scaling range).

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019