back to article Nvidia Tesla bigwig: Why you REALLY won't need x86 chips soon

Life is what happens when you are trying to do other things, as the old saying goes. Jen-Hsun Huang, co-founder and CEO of Nvidia has been perfectly honest about the fact that the graphics chip maker didn't intend to get into the supercomputing business. Rather, it was founded by a bunch of gamers who wanted better graphics …

COMMENTS

This topic is closed for new posts.
  1. Anonymous Coward
    Anonymous Coward

    honesty

    "I am very happy that most of the world thinks that ARM provides a great efficiency advantage over X86. The truth is, it really doesn't. It is a small second-order effect. The reality is, you squint and there is no power advantage to ARM."

    Good to hear some honesty. Amazing the number of people who should know better that come out with the ARM efficiency illusion and miss the real significance.

    1. JC_
      Thumb Up

      Re: honesty

      Yep, this quote is for posterity and belongs at the top of the comments section for every ARM & x86/x64 article with "Steve Scott, the CTO at supercomputer maker Cray" in bold.

  2. Christian Berger

    So... how do you boot?

    Considering that most ARM SoCs are incompatible to each other it's an extremely fragmented market. You cannot even boot an operating system image not ported to that platform.

    If you don't solve that problem you will end up with boxes having an outdated operating system running since the vendor won't be able to provide any updates. Sure this may be acceptable for certain situations, but it's still a show stopper for others.

    1. Destroy All Monsters Silver badge
      Devil

      Re: So... how do you boot?

      Write a fracking VM around it. Pretend it's called a "BIOS". World domination.

    2. Daniel B.

      Re: So... how do you boot?

      A "PC" SoC would have a more firm interop standard. It would be necessary for mass production, so it will be there. Maybe Open Firmware or something like that?

      1. Luke McCarthy

        Re: So... how do you boot?

        Coreboot hopefully. Please, no UEFI.

    3. BozNZ
      Holmes

      Re: So... how do you boot?

      Bit of history

      The reason the PC dominated the opposition for the first 20 years was the BIOS layer which provided an abstract firmware layer for developers to use, this is why DOS will still load on a Core2 motherboard if you wanted it. ARM SOC's are incompatible, you will need to re-write from scratch most of your lower level code for every new chip. Here lies the problem

      Also since 32 bit Operating systems the BIOS has just become the bootloader as it is 16 bit. A 32/64 bit BIOS would have been a good standard but there was no IBM to force it through by the time it was required, not even sure if it was ever proposed

      As an SOC programmer having to code new for every ARM variant is a nightmare (datasheets for each soc run to 1000's of pages) A 32/64 bit BIOS layer for ARM is long overdue.

      1. Christian Berger

        Re: So... how do you boot?

        Yes the BIOS played a big part, but so was the standardized hardware. Graphics cards for example were mostly backwards compatible and complied to distinct standards. So you could buy a CGA card, and EGA card and a VGA card, and between each standard they were mostly alike. Plus a VGA card was backwards compatible to an EGA one, and the EGA was backwards compatible to the CGA one. That's why, when you had a VGA card and started software in CGA mode, it would still work.

  3. This post has been deleted by its author

  4. Anonymous Coward
    Anonymous Coward

    Negative marketing wont work

    X86 chips will be around for a very long time.

    ARM is actually less efficient (performance per watt).

    The ARM cpu's being used for comparison are actually underpowered and running mobile OS's.

    X86 has not been replaced in the data center and probably wont for a long time.

    1. itzman

      Re: Negative marketing wont work

      I don't think the issue is that ARM is less flops per watt, but that NVIDIAS GPU core is less flops per watt

      ARM is simply there to glue it together.. to provide the program flow logic to feed the mathematics to the GPUs.

    2. Charlie Clark Silver badge

      Re: Negative marketing wont work

      ARM is actually less efficient (performance per watt).

      That, or the converse claim that ARM is more efficient have to be qualified: what geometries? what OPS? single or multithreaded? The last comparison I saw still had ARM more efficient at low level Ops but then the x86 instruction set is more significantly more powerful in single-threaded environments or for specific operations such, which is where the GPUs come in and why many HPC environments already run x86 with GPUs.

      ARMs advantages are plain (price, die size) for many to see which is why so much work is being poured into making ARM-based servers. The development of ARMs over the last few years has been significantly faster than in x86. Intel may still be ahead on manufacturing process but 16nm 64-bit ARMs are now in development with 14nm planned (TSMC).

      1. Dave 126 Silver badge

        Re: Negative marketing wont work

        >That, or the converse claim that ARM is more efficient have to be qualified: what geometries? what OPS? single or multithreaded?

        As a rough idea, Tom's Hardware compared task-for-task x86 vs Arm as best they could using Win8 RT - which is available for both architectures. They didn't declare an outright winner, but announced their intention to watch developments with interest. They found enough to dispel the 'ARM is always moar power efficient' assumption, though.

        1. Anonymous Coward
          Anonymous Coward

          Re: Negative marketing wont work

          Which x86 vs which ARM would be helpful information. But isn't provided here, Nor is a link to Tom's article.

        2. Charlie Clark Silver badge

          Re: Negative marketing wont work

          As a rough idea, Tom's Hardware compared task-for-task x86 vs Arm as best they could using Win8 RT - which is available for both architectures.

          IIRC the Atoms are made with more advanced process (3D FET) than the ARMs. Similar tests have been carried out on Android with the Motorola Razr i (x86) doing very well generally and particularly single-threaded applications but poorer at task-switching in comparison with ARM based phones (CT Magazin 03/13 and 22/12 both in German and pay-per-view. Such comparisons, however, are full of caveats. For real computing comparisons you have to use the Spec benchmarks and read the footnotes carefully.

          It's probably also worth noting that the Atoms are the only x86-chips close to the ARM power envelopes and compare unfavourably against more standard x86-fare like i5 and i7, but they use a lot more juice of course.

          To support my own claim that ARM is developing faster it would be nice to see comparisons of performance improvements of the ARMs (Exynos) say in the Samsung Galaaxy S series with Intel's Atoms over time.

          1. Anonymous Coward
            Anonymous Coward

            Re: Negative marketing wont work

            "For real computing comparisons you have to use the Spec benchmarks and read the footnotes carefully."

            Well done Charlie! At last someone points out that if you want documented objective facts you might want to start with an advertising-independent organisation which produces results that are reasonably well documented and reasonably reproducable.

            That being said, some of the SPEC benchmarks are IO intensive as well as compute intensive, which isn't what we're talking about when comparing CPU core technology as we are here.

            A relatively recent arrival on the benchmarking scene is CoreMark, whose benchmark suite is aimed at the embedded market and therefore doesn't do much IO at all, so all it's really testing is the compute power (CPU core, memory subsystem, etc). The source code of the suite is freely downloadable. They have 400+ published results at the moment. Some of the published benchmarks are user-submitted , some of them are also verified by the CoreMark folks.

            Extracts from three semi-randomly chosen, relatively recently submitted, results from the CoreMark website (please see http://www.coremark.org/benchmark/ for full details, this here is a gross oversimplification):

            ARM Cortex A15 1700 (1700 MHz, April 2013) CoreMark/core: 7954

            Intel Atom N2800 1860 (1860MHz, Dec 2012) CoreMark/core: 6143

            Intel i7 3612QE 2100 (2100 MHz, Jan 2013) CoreMark/core: 20982

            Maybe someone with more clue than Tom's could write a proper article based around these benchmarks and these numbers (which should, as I mentioned, be easily reproducable).

            1. Anonymous Coward
              Anonymous Coward

              Re: Negative marketing wont work

              Go over to Phoronix for some copmarisons.

              http://www.phoronix.com/scan.php?page=article&item=ubuntu_1204_armfeb&num=1

              NB This will be over the heads of Windows monkeys.

              1. Anonymous Coward
                Anonymous Coward

                Re: Negative marketing wont work

                and this for a15 v tegra3 v x86

                http://www.phoronix.com/scan.php?page=article&item=samsung_exynos5_dual&num=1

          2. ChrisInAStrangeLand

            Re: Negative marketing wont work

            "IIRC the Atoms are made with more advanced process (3D FET) than the ARMs."

            You do not remember correctly, the Atoms on Intel's 22LP process haven't yet launched. Intel SOCs currently lag the state of the art at 32nm.

  5. Anonymous Coward
    Anonymous Coward

    What he really means is...

    ...that Nvidia don't gots no x86 processors and AMD do, so Nvidia got locked out of the next PS, Xbox and Wii. AMD also has just released a bunch of cool, low power SoCs, so again Nvidia loses out on this large market segment.

    1. danolds

      Re: What he really means is...

      But you're forgetting that the Keppler feature set brings a lot to the table, features/capabilities that will allow service providers to serve up games online that have performance and latency that are console-like. They probably won't be able to offer up enough performance to satisfy the hard core PC gamer, but, over time, will certainly be able to compete successfully vs. xBox, PS, and Wii on both quality and cost. And those service providers will be buying Kepplers by the boat load to make this happen.

      1. Zmodem

        Re: What he really means is...

        consoles are a joke compared to pc games, consoles are always 7 years behind a modern PC game. the same rendering is done as hollywood movies, with every models hair using cuda cores etc

    2. Daniel B.
      Go

      Re: What he really means is...

      ... that Sony fucked up big time by choosing Craptel x86 for their next PS. In fact, the PS4 is going to be so underpowered that it won't be able to play PS3 games. The CellBE processor runs circles around even current-gen x86 chips, and it's at least 7 years old! The only things that actually match/overcome the CellBE are GPUs. Which is what Nvidia is pushing, and putting an ARM as the frontend. Basically, getting the boost AMD is now getting but also improving the x86 chokehold.

      Good luck, Nvidia!

      1. Zmodem

        Re: What he really means is...

        ps4 has a AMD FX 8 core and 8gb of ram, and gddr on the card. on release day it will be able to play default detail PC games storing all 1024x1024 textures and models in RAM, and using the 40gbs bridge to the card..

        in 2 or 3 years, you will have another version with a new card, and 16 - 32 gb of ram for when PC games use 4096 x 4096 textures

  6. This post has been deleted by its author

  7. Zmodem

    try making quad sli on a single pci-e 3.0 card, if your going to have a 64 core server running cuda processes, you would still want quad sli

    1. Dave 126 Silver badge

      >if your going to have a 64 core server running cuda processes, you would still want quad sli

      Only if communication between the cores is the bottleneck, and if SLI is a suitable interconnect for it. More likely, you would design interconnects specifically for the task in hand, just as SLI was developed for sharing the load of graphics.

      1. Zmodem

        if each telsa ends up having something stupid like 30 cuda cores each, then you would still want sli with each 3000 cuda cores on gtx card for you local computer science simulations and hollywood movies

  8. All names Taken
    Paris Hilton

    Maybe ...

    ... in the daze of yore when baking chips was not quite as precise as it is now with lots of less than standard or sub-standard but working in a way cooked silicon it seemed optimal to go for a universal do-it-all chip that required quite a lot of stuff around it (let's call it a motherboard?) and other bits and pieces that can be plugged in or out of the said mobo.

    Hmmm - we better formalise some partnership committees to set standards for all of those bits that can be added (maybe we can call 'em plug n play?)

    Anyway, the stroll to universal computing machine continues.

    Alternatively: don't want a universal solution that is capable of doing almost everything provided it has a suitably large framework of stuff that can be plugged in or plugged out. Nope.

    This one is merely what can we do to optimise/maximise a bit of kit that does stuff very well with low energy requirements with minimal additional support on the circuit board(s)?

  9. Michael H.F. Wilkinson Silver badge
    Boffin

    Not all parallel programs have the same demands

    There are many tasks that run like the clappers under CUDA. These are all those tasks that are of a more-or-less SIMD nature, like large matrix multiplications, Fourier transforms, and any other method that has a predefined processing order, preferably with a lot of micro-parallelism in there. Subtasks also need to be fairly isolated, to minimize communication load. For those tasks Kepler and Tesla-like processors are great (we have a couple).

    However, there are also tasks in which the processing order is data driven, and where each processor might need to access arbitrary parts of the (large) data set here. I am currently doing multi-scale analysis of 3.9 Gpixel images, and doing that on a Kepler or Tesla board is a nightmare. Our 64-core Opteron machine gets between 32-50 times speed-up, because this algorithm is best using coarse-grained parallelism. X86-64 machines are not going away soon, and GP-GPU-processing is not a panacea (great though it is).

  10. mark l 2 Silver badge

    "Our Denver 64-bit ARM core will be higher performance than anything you can buy from ARM Holdings. "

    AFAIK ARM holdings don't sell any hardware, only licenses to produce ARM chips so not really sure what is meant by the above statement?

    He does have a point in that ARM greatest strength is not its power efficiency but its open architechture, meaning you can build your own ARM SOC to the requirements you need for your particular device and costs needs which is why you wouldn't get Samsung or Apple using the Intel X86 for their phones or tablets because Intel won't design a custom SOC just for them. So until Intel change their policy on that the two largest mobile devices manufacturers are going to be sticking with ARM.

    1. Wilco 1

      He simply meant that Denver will be faster than ARM's fastest 64-bit core, Cortex-A57. So it might be 4-way OoO like AMCC's 64-bit ARM, ie. seriously quick given that A57 beats A15 by a good margin.

  11. hamsterator
    Thumb Up

    Power Efficiency

    Great interview, Tim! Refreshing honesty, I think (at least in terms of intentions and vision). It would be foolish to count Intel out of the equation; developers as always will determine how all this plays out.

  12. Anonymous Coward
    Anonymous Coward

    GPU aside...

    CPU wise, I'm pretty sure x86 applications won't work on ARM architechture.

    So what's the benefit to me, here? I'd be unable to use applications.

  13. Zot

    x86 uses microcode instructions...

    Whereas the ARM chips just use the minimal raw instructions.

    Microcode is another layer under x86 machine language - isn't that a waste of power?

    1. Wilco 1
      Boffin

      Re: x86 uses microcode instructions...

      The x86 complexity certainly wastes power, which is more noticeable in low power designs and less so in high performance cores. This shows the relative sizes of ARM and x86 cores: http://chip-architect.com/news/2013_core_sizes_768.jpg. Jaguar achieves similar performance as Cortex-A15 but needs twice the area. Atom needs a lot more area in order to achieve low voltage / low power operation, and it's huge die size explains why we won't see quad core versions until 22nm. So yes there is definitely a big cost to x86 - few companies have succeeded making competitive designs. With ARM it is far easier to design a high-end CPU.

This topic is closed for new posts.

Other stories you might like