back to article Software needs meaty cores, not thin, stringy ARMs, says Intel

Intel has offered another reason it doesn't think ARM processors pose an enormous threat to its high-end chip business: software isn't written to run across multiple CPUs. That's the opinion Gordon Graylish, Chipzilla's general manager for enterprise solution sales, shared with The Reg yesterday at the Australian launch of the …

COMMENTS

This topic is closed for new posts.
  1. Anonymous Coward
    Anonymous Coward

    FUD FUD FUD, can't Intel devote their time to making their products better instead of trying to just kill competition off?

    1. Anonymous Coward
      Anonymous Coward

      Because...

      They would rather milk their customers for as much as they can at each level ...

      ... before incrementing the core speed

      ... before incrementing the number of cores

      1. JeffyPoooh
        Pint

        Re: Because...

        Real men use just a single NAND gate and a handful of delay lines.

    2. Charles Manning

      FUD, certainly

      Intel have actually made their products remarkably better, but there are inherent limitations in the x86 that limit how far they can push things. It is absolutely amazing what they have been able to achieve, but it is not enough to compete.

      That cleverness comes at a huge price though:

      Intel needs to add extra layers of processing and pipelining to get good performance. All these extra layers make the chips far more expensive to produce. The most cracker ARMs only use half the transistors of an Atom.

      Then extra layers have extra transistors that must be toggled. Toggling transistors eats power. Hence, the Intel CPUs chomp through batteries.

      Intel need to use the most gee-wizz manufacturing processes to remain competitive. That makes the manufacturing equipment very expensive. The ARMs, on the other hand, can still do great things with more mature processes meaning they can be made cheaper.

      It would be interesting what Intel could achieve if they threw their manufacturing might and technology into the ARM ring. However Intel probably could not survive in the low-margin game where ARM parts thrive.

      All that is left is to keep spreading the FUD and try to keep some of their existing high-margin business.

  2. A Non e-mouse Silver badge

    Wasn't it a few years ago that Intel were investing in research & software tools to auto-parrallelise code because they realised that CPUs were reaching a limit in terms of clock speed.

    If a (few) meaty cores are really required, why do Intel sell processors with 10 cores?

    1. theblackhand

      Parallelising code

      Intel did invest in this and still are to my knowledge, but it doesn't change the nature of the problems being solved. Parallel programming is harder than single thread programming.

      Note that this is looking at doing an existing task faster rather than doing more more tasks (that may have no dependencies) in the same time. More cores doesn't necessarily make the first faster if they sit idle.

      This answers it far better than I could:

      http://www.futurechips.org/tips-for-power-coders/parallel-programming.html

  3. Trevor_Pott Gold badge

    The world needs Xeons, eh?

    Then why do my 6-year-old AMD Shanghai servers sit below 50% CPU utilization even when their VM capacity is completely maxed out? CPU isn't the bottleneck, Intel. RAM, and IOPS are.

    1. Howverydare

      Entirely dependent on the workload. We have customers brutally murdering some older quad-10-core Xeon servers with less than maximum memory installed. Even VDI installations can become massively processor-limited.

      But then, as you say, a set of generic servers on a virtualisation environment certainly from 6 years ago, you couldn't afford the memory to use all of the processor available.

    2. Ian Michael Gumby
      Boffin

      @Trevor Pott

      Maybe its because your work load is lame and not CPU intensive? :-P

      To your point, when the Intel talking head says:

      "Intel has offered another reason it doesn't think ARM processors pose an enormous threat to its high-end chip business: software isn't written to run across multiple CPUs."

      I guess he hasn't heard about this thing called Hadoop? (There's more , but if you don't grok Hadoop then you probably don't know anything about Mesos and spark)

      CPUs can be the bottleneck depending on what you are doing.

      And yes, memory and disk are the major bottleneck today...

      1. quartzie

        Re: @Trevor Pott

        Two points.

        Hadoop (and many other parallel/hybrid processing frameworks) cannot change the fact that some problems simply aren't very suitable for parallel processing. I believe Intel employs a good number of people who know their way around Hadoop, but they've also noticed not everyone is doing big data in their own garden.

        That said, if you've attempted parallel processing on any larger scale, you would notice that getting the system to run efficiently, given a limited memory bandwidth, is a major task and often crucial for deployment on any cloudy distributed platform.

        1. Trevor_Pott Gold badge

          Re: @Trevor Pott

          " if you've attempted parallel processing on any larger scale, you would notice that getting the system to run efficiently, given a limited memory bandwidth, is a major task and often crucial for deployment on any cloudy distributed platform."

          Absolutely true, which is exactly why I don't think "fat cores" is the answer. On my HPC-like applications I run into real issues with memory bandwidth on the local node, let alone bandwidth for message-passing between nodes. Now, the new A3Cube PCI-E fabric might help a little on the inter-node stuff, but local to the host? We still need a hell of a lot more memory bandwidth per core.

          Even in "standard" virtualisation loads I hit the wall on memory bandwidth. Things like Atlantis ILIO using RAM as a cache for VDI will wreck the memory bandwidth available, leaving those big, meaty cores gasping.

          Give me stringy cores with fat RAM pipes any day. All the CPU muscle in the world is worth exactly nothing if I can't feed the damned things. That means RAM, it means storage IOPS and it means network fabric. CPU oomph just doesn't appear on my radar, excepting for the most carefully-tuned (and hence exceptionally rare) applications. There are just too many other bottlenecks that need addressing first.

    3. Irongut

      Maybe if you had modern RAM that ran at modern speeds instead of 6 year old RAM then it wouldn't be so much of a bottle neck. Hmmm.

      1. Trevor_Pott Gold badge

        Okay. I also have nodes with 2x Intel Xeon 8 Core E5-2680 CPUs, 128GB RAM/host, 2x Intel 520 480GB SSDs and 2x 10GbE Links. Across the average day the cumulative CPU usage is less than 10%. In fact, it only ever hits 80% for about 30 minutes a day.

        Methinks the bottleneck be not the CPU. Not for me, and - quite frankly - not for most folks.

        1. Howverydare

          So you've not got maximum memory installed, and complain that you're memory limited? Do we have another contender for a Darwin award?

          I've got a server with 2x E5-2680v2 processors in it (like yours, but newer, faster and with more cores) and 256GB of RAM (with spare slots). 8 VDIs and it's struggling for processor, and with four 1GbE links and eight sTec SSDs it's not troubling. The memory isn't even fast memory, it's only 1600MHz stuff.

          Ignorance to what happens outside of your (small) server cupboard doesn't mean that it doesn't exist. There are dozens of implementations of fast, multi-core processors that are bottlenecking systems. I work with many.

          1. Trevor_Pott Gold badge

            Actually, I do have the maximum memory installed for the motherboards in question. Nor am I saying that everyone is the same OMFGWTFBBQ!!!111!!11oneoneone.

            I do, however, have this tenancy to pay attention to the world around me, and I have noticed that people who fund the CPU a bottleneck are the exception, not the rule. What's more, of those who do find the CPU the bottleneck, the overwhelming majority of them rewrite their code for a GPU, custom ASIC or otherwise move to non-CPU silicon.

            This is the era of custom chip, bub. Big, fat, meaty CPUs are just not needed by the majority...and for the kinds of reasons I stated above.

            But hey, get your panties in a bunch because you are incapable of parsing things excepting as absolutes and extremes. You must be a blast at parties.

          2. Mikel

            8 VDIs

            Your VDIs are GPU limited. Congratulations on pouring a ton of money into solving every problem but the one you have.

  4. Torben Mogensen

    Keeping with horse-drawn carriages

    While Intel is correct in saying that most software these days is written for single-core, sequential processors, and that it is, indeed, easier to do so, there is little doubt that the future belongs to massive parallelism from many small cores rather than speedy single cores: At a given technology, doubling the clock speed will roughly quadruple power use, because you both need more power to make the transistors switch faster and because every switch costs power. For the same power budget, you can get four cores, which gives you twice the total compute power. It is true that there are inherently sequential problems, but these are fairly rare, and the jobs that require most compute power (cryptoanalysis, data mining, image processing, graphics, ...) are typically easy to make parallel.

    Intel's strength is in making fast cores, but they are also power-hungry and expensive. The former because that didn't matter much for desktop PCs and servers up through the 90s and 00s, and the latter because Intel had a de-facto monopoly on x86 processors for PCs and servers, and most 3rd-party software was written exclusively for x86. These days, power matters more: Desktop PCs are more or less a thing of the past (excepting with a few hardcore gamers) and power use is increasingly an issue in data centres. Intel is trying to adapt, but it fears to lose its dominant position before the adaptation is complete. Hence, these bombastic claims.

  5. AceRimmer1980
    Boffin

    auto-parrallelise code?

    20 years ago, I worked for a company which made image processing boards based around Transputers. You could have as many chips as your wallet could stand, and the code would indeed auto-parrallelise, and scale nicely. If the principles were well-known then, why can't this still be done?

    1. stucs201

      Re: auto-parrallelise code?

      Surely image processing is relatively easy to (auto)paralise - just split up into multiple smaller images. Not all processing parallelises that easily.

    2. John Smith 19 Gold badge
      Boffin

      Re: auto-parrallelise code?

      "20 years ago, I worked for a company which made image processing boards based around Transputers. You could have as many chips as your wallet could stand, and the code would indeed auto-parrallelise, and scale nicely. If the principles were well-known then, why can't this still be done?"

      Obvious question. Was it written in Occam?

      A language designed for fine grained parallelism (IIRC each statement was conceptually a process, so it looked like a regular computer language but had some subtle features).

      1. AceRimmer1980
        Pint

        Was it written in Occam?

        Good guess :-) but no, we had a parallel-C compiler (I'm guessing this linked to a similar RTL)

        Obviously not suited to every problem, but still, if you could express your algorithm using what must've been some sort of Map/Reduce paradigm, you could do your development on a single TRAM and then run it on the departmental beast when ready.

    3. Nick Ryan Silver badge

      Re: auto-parrallelise code?

      I know I'm not exactly an "average" developer, but I was working on multi-CPU x86 code in 2002 (on Athlon MP CPUs if it matters).

      It's not hard, or at least I didn't find it so, when you are aware of concurrency issues and know how to code parallel tasks and in particular what can be easily run or is appropriate for concurrent processing.

      The hardest part was dealing with the utter ball ache that was (and still is in some ways) concurrent access to the Windows GDI, let alone the complete train wreck often involved in running anything ActiveX related concurrently.

      The Intel C extensions for parallel code also make it a doddle but, again, you need to know what you are doing. IMHO the historical ghastly native support in Visual Studio for concurrency was a big problem.

      As for Intel vs ARM, yes the x86 instruction set sucks balls compared to the ARM instruction set and this requires a lot more (very) clever optimisations from Intel, but even with this aside, it's just depressing how for windows applications, in much of the code 95% of the time nothing productive is being done with the CPU cycles.

      1. John Smith 19 Gold badge
        Unhappy

        Re: auto-parrallelise code?

        "It's not hard, or at least I didn't find it so, when you are aware of concurrency issues and know how to code parallel tasks and in particular what can be easily run or is appropriate for concurrent processing."

        I think this depends on how many processors you're talking about. <8 probably fairly easy. Over 8 you're looking at Amdahls law to start biting down.

        "The hardest part was dealing with the utter ball ache that was (and still is in some ways) concurrent access to the Windows GDI, let alone the complete train wreck often involved in running anything ActiveX related concurrently."

        Not really sure why you'd do this. I'd have guessed Windows expects multiple apps to write to different windows on the screen as they run. I could see the trouble starting when multiple apps on a remote server want in as well.

        1. Nick Ryan Silver badge

          Re: auto-parrallelise code?

          Not really sure why you'd do this. I'd have guessed Windows expects multiple apps to write to different windows on the screen as they run. I could see the trouble starting when multiple apps on a remote server want in as well

          The pain with windows GDI is one app with multiple concurrent threads of execution where it makes sense for them to update the interface independently. In theory it shouldn't be a problem because windows deals pretty well with multiple applications, with varying processor affinities, updating user interfaces simultaneously however as soon as you try to put this all into one application the deficiencies in the GDI start to come through. It's not unexpected of course, as windows was designed as a single user, single processor shell rather than anything more sophisticated and the multi-processor and multi-user was bolted on later as a virtual afterthought.

          In case you're wondering why GDI is/was being used, many of the newer windows APIs are sometimes little more than translation or management layers for the underlying GDI layer so not only do you suffer from the hidden GDI problems but you also have another layer of abstraction and inefficiency on top to deal with. The aim was to fix this in WPF however WPF was practically unusable for a long time and brings its own problems to the game.

  6. This post has been deleted by its author

  7. All names Taken
    Paris Hilton

    Shame on intel

    Of course most software will be written for x86 ways of doing things - why should it bother intel if the chip-hardware-software ecology blossoms into doing things differently?

  8. Mtech25
    Devil

    Intel will be happy to know I have 6 beefy cores all rated at 3.5GHz

    But it is an AMD chip

    1. larokus

      Re: Intel will be happy to know I have 6 beefy cores all rated at 3.5GHz

      Now if only lt had 12 cores or even double the power consumption it would be as fast as that Intel chip

      Unless you have a product from the future where AMD is once again competitive

    2. Howverydare

      Re: Intel will be happy to know I have 6 beefy cores all rated at 3.5GHz

      Intel will be even happier to note that I have 12 beefy 3.46GHz cores in an old machine, and 20 beefy 2.8GHz cores in the new one. And they're all Intel.

      AMD always lose at Billy Big Balls. You just have to pay for them - or find someone else happy to do so for you.

  9. Ant Evans

    Power law

    Whatever delivers most things per wall watt, wins.

    1. Charlie Clark Silver badge

      Re: Power law

      Maybe per Watt per dollar? Intel keeps on going on about performance mainly for the reason of cost. Even Atoms and ARMs are converging around performance per Watt, you can still get an ARMful of ARMs for the price of one Atom. That means more memory, networking, etc. or even margin for the system developer.

    2. stucs201

      Re: Power law

      Things per wall watt only matters if things and things per second are sufficient. While often that is the case sometimes its not.

      1. stucs201

        Re: Power law

        I don't normally grumble about downvotes, but on this occasion hope those responsible aren't specifying processors for real-time systems where a response is required in a certain timeframe to prevent something physically crashing/exploding/otherwise killing people. That isn't the time to be deciding a processor which can't deliver results in time but uses less power is appropriate.

  10. hammarbtyp

    Intel need a good diet

    Of course what they don't mention is that the reason that Intel cores are so meaty is that they have to contend a legacy x86 instruction set.

    Unfortunately that isn't meat, it's fat.

  11. Robert E A Harvey

    These diesel engines will never be as powerful as a steam locomotive! - says steam locomotive designer.

    1. larokus

      And that steam engineer would be right for the same reason we don't couple diesel engines to generators for anything greater than local emergency power. When you want to drive an 880MW turbine you use steam. The worlds electricity is powered by it for a reason. Energy per KG@538C is not something to be taken lightly. Steam locomotives had to die more due to combustion efficiency and fuel capacity limitations, and of course that pesky risk of BLEVE [boiling liquid expanding vapour explosion] but not power ;)

  12. jason 7

    Reminds me of the great Linn Sondek press conference.

    When Ivor was questioned by the press as to why the Sondek only had 33rpm he stated dismissively and confidently that 33rpm sounded better than 45rpm.

    "In that case why aren't records spinning at 1rpm then?" was the reply.

  13. Luke McCarthy

    Rubbish

    Everyone is writing parallel code these days. It has been mainstream for a long time now.

    1. Steve Davies 3 Silver badge

      Re: Rubbish

      citations please?

      1. Ian Michael Gumby

        Re: Rubbish

        Can you say Hadoop? (Cloudera,Hortonworks, MapR, Intel, IBM , Pivotal(EMC)) all have distros

        Can you say Cassandra?

        Can you say Accumulo?

        Can you say Mesos?

        The list goes on...

        1. Anonymous Coward
          Anonymous Coward

          Re: Rubbish

          Hah, so I guess if you can list a few things that are multithreaded, that means everything is?

          Let's say I offer you a choice between a modern dual-core laptop and a slightly cheaper quad-core laptop where each of the cores runs half as fast. Which would you buy? The quad core laptop will take twice as long to render web pages, probably won't be able to play streaming HD Flash/Silverlight video without stuttering, and will get much lower framerates when playing any modern game... but hey, Hadoop jobs will run at the same speed!

      2. Howverydare

        Re: Rubbish

        Yes, citations. CAD applications (the most likely to be multithreaded) are barely able to drag their sorry backsides across two cores, Microsoft Office isn't particularly multithreading and the only reason browsing the internet manages to be multithreading is by spreading tabs out across threads and because everything requires a damned plugin.

        That stuff just doesn't count.

  14. John Smith 19 Gold badge
    Meh

    Wasn't this why people are looking for people with Hadoop skills?

    It is meant to be a parallel DB engine is it not?

    1. Anonymous Coward
      Anonymous Coward

      @John Smith ... Re: Wasn't this why people are looking for people with Hadoop skills?

      Hadoop is not a database. Its a parallel processing framework where a NoSQL database could be part of it. (Accumulo and HBase are two examples that sit on the cluster. ) On MapR, you can add Vertica to that list.

      While the Intel talking head is a prat, there are advantages to using E7 chips which are multi-core.

      Looking at emerging technologies. Pairing E7s and SanDisk's Ultradimm among other tech means that you can do more with an overall less consumption of power in a smaller footprint.

      As long as you can maintain a ratio of 1 disk per virtual core (for intel thats a 2x ratio to physical cores), you will not be I/O bound. Of course you can use the extra cores to virtualize the box too. ;-)

      Posted Anon for the obvious reasons.

  15. Ken Hagan Gold badge

    Obvious troll is obvious

    "“The world has a big issue around vectorisation and parallelisation of code,” Graylish said. “99% of code isn't written that way.” Graylish also feels “defining a workload that can run in 1000 cores is hard.”"

    A 1000-core chip is 1.5 orders of magnitude more than anything actually on sale to mainstream customers. The fact that software doesn't currently target such a beast tells you nothing about what programmers might do if they could lay their hands on one. Right now, programmers know that dividing their logic into several hundred separate strands will provide zero benefit, possibly less. It would be rather odd if anything actually did it.

    Meanwhile, in the rather small but costly world of Google, Amazon and the like, we *do* find embarrassingly parallel workloads looking for hardware that maximises performance per watt.

    1. Ant Evans

      Re: Obvious troll is obvious

      “The world has a big issue around vectorisation and parallelisation of code,” Graylish said. “99% of code isn't written that way.” Graylish also feels “defining a workload that can run in 1000 cores is hard.”

      With the greatest respect to Intel and Graylish, this conflates processes and threads. Since it's coming from Intel, whose business it is to know this stuff, I can only assume it's a deliberate obfuscation.

    2. theblackhand

      Re: Obvious troll is obvious

      I think Intel are trying to address the proposed ARM servers where a lot of separate CPU's are bundled together (i.e. Calxeda and AMD/SeaMicro). VM's already provide an easy way to utilise this setup and the large data centre operators know how to manage large node count environments.

      SeaMicro are in a particularly interesting position as they have a product that scales and supports x86 and ARM on a cheap interconnect.

      The real question about ARM servers is whether providing more performance hurts their power consumption significantly. A big chunk of x86 power consumption is down to cache and IO - most ARM cores reduce both to keep power down and nether is hard for Intel to reproduce if required.

  16. Bartholomew

    way way way way more cores please

    I personally think that things will start to get interesting when you have a million cores on a piece of silicon. Clocked at a low enough frequency of about 2 MHz. A bit of RAM each, a few registers, a not so complex ALU, maybe even integer only.

    If each core could simulate a neuron this would be comparable to a human brain's 85,000,000,000 neurons clocked, allowing for the speed of a signal pulsed about the brain, at about 24Hz. OK you are missing all the interconnections between the neurons, But I still think that there would be interesting things to see. Me I don't care if it is Intel or ARM who get there first, I just want to be around to see what happens next.

  17. Tom 7

    Why does this statement give me the image

    of a naked Intel legs akimbo sitting backwards on a chair?

  18. Watashi

    Two models of computing

    This is not about numbers of cores or architecture per se, but is about basic physics. There is a direct relationship between the work done by a software process and the electrical power required. There is also a direct relationship between the word done and the heat generated. That means that heavy processing tasks will always run better on processors that are not designed with power consumption efficiency in mind. The bottleneck is how much entropy the system can manage. You wouldn't dream of trying to dissipate anywhere near as much heat through a tablet's case as you would through an MT form factor CAD machine's case.

    A big give-away is the battery life of smartphones. Very few mobile devices last a full day of medium-level use without needing a bit of a charge-up. It would be quite easy to put a bigger battery in smartphones like the new Nexus 5 so that they ran either faster for the same time or for longer. Why don't they do this? Because a) users of smartphones don't expect a full day's charge and b) the limit to processing power in smartphones is the heat generated, not the processing power available. I predict that we are not far off the ceiling for mobile device processing power until we develop much cooler ways of carrying out computer calculations.

This topic is closed for new posts.

Other stories you might like