Feeds

back to article Software needs meaty cores, not thin, stringy ARMs, says Intel

Intel has offered another reason it doesn't think ARM processors pose an enormous threat to its high-end chip business: software isn't written to run across multiple CPUs. That's the opinion Gordon Graylish, Chipzilla's general manager for enterprise solution sales, shared with The Reg yesterday at the Australian launch of the …

COMMENTS

This topic is closed for new posts.
Anonymous Coward

FUD FUD FUD, can't Intel devote their time to making their products better instead of trying to just kill competition off?

12
3
Anonymous Coward

Because...

They would rather milk their customers for as much as they can at each level ...

... before incrementing the core speed

... before incrementing the number of cores

5
1
Silver badge

FUD, certainly

Intel have actually made their products remarkably better, but there are inherent limitations in the x86 that limit how far they can push things. It is absolutely amazing what they have been able to achieve, but it is not enough to compete.

That cleverness comes at a huge price though:

Intel needs to add extra layers of processing and pipelining to get good performance. All these extra layers make the chips far more expensive to produce. The most cracker ARMs only use half the transistors of an Atom.

Then extra layers have extra transistors that must be toggled. Toggling transistors eats power. Hence, the Intel CPUs chomp through batteries.

Intel need to use the most gee-wizz manufacturing processes to remain competitive. That makes the manufacturing equipment very expensive. The ARMs, on the other hand, can still do great things with more mature processes meaning they can be made cheaper.

It would be interesting what Intel could achieve if they threw their manufacturing might and technology into the ARM ring. However Intel probably could not survive in the low-margin game where ARM parts thrive.

All that is left is to keep spreading the FUD and try to keep some of their existing high-margin business.

2
2
Silver badge
Pint

Re: Because...

Real men use just a single NAND gate and a handful of delay lines.

1
0
Silver badge

Wasn't it a few years ago that Intel were investing in research & software tools to auto-parrallelise code because they realised that CPUs were reaching a limit in terms of clock speed.

If a (few) meaty cores are really required, why do Intel sell processors with 10 cores?

8
1
Bronze badge

Parallelising code

Intel did invest in this and still are to my knowledge, but it doesn't change the nature of the problems being solved. Parallel programming is harder than single thread programming.

Note that this is looking at doing an existing task faster rather than doing more more tasks (that may have no dependencies) in the same time. More cores doesn't necessarily make the first faster if they sit idle.

This answers it far better than I could:

http://www.futurechips.org/tips-for-power-coders/parallel-programming.html

3
0
Gold badge

The world needs Xeons, eh?

Then why do my 6-year-old AMD Shanghai servers sit below 50% CPU utilization even when their VM capacity is completely maxed out? CPU isn't the bottleneck, Intel. RAM, and IOPS are.

3
2

Entirely dependent on the workload. We have customers brutally murdering some older quad-10-core Xeon servers with less than maximum memory installed. Even VDI installations can become massively processor-limited.

But then, as you say, a set of generic servers on a virtualisation environment certainly from 6 years ago, you couldn't afford the memory to use all of the processor available.

2
1
Silver badge
Boffin

@Trevor Pott

Maybe its because your work load is lame and not CPU intensive? :-P

To your point, when the Intel talking head says:

"Intel has offered another reason it doesn't think ARM processors pose an enormous threat to its high-end chip business: software isn't written to run across multiple CPUs."

I guess he hasn't heard about this thing called Hadoop? (There's more , but if you don't grok Hadoop then you probably don't know anything about Mesos and spark)

CPUs can be the bottleneck depending on what you are doing.

And yes, memory and disk are the major bottleneck today...

1
0
Silver badge

Maybe if you had modern RAM that ran at modern speeds instead of 6 year old RAM then it wouldn't be so much of a bottle neck. Hmmm.

0
0
Gold badge

Okay. I also have nodes with 2x Intel Xeon 8 Core E5-2680 CPUs, 128GB RAM/host, 2x Intel 520 480GB SSDs and 2x 10GbE Links. Across the average day the cumulative CPU usage is less than 10%. In fact, it only ever hits 80% for about 30 minutes a day.

Methinks the bottleneck be not the CPU. Not for me, and - quite frankly - not for most folks.

1
0

So you've not got maximum memory installed, and complain that you're memory limited? Do we have another contender for a Darwin award?

I've got a server with 2x E5-2680v2 processors in it (like yours, but newer, faster and with more cores) and 256GB of RAM (with spare slots). 8 VDIs and it's struggling for processor, and with four 1GbE links and eight sTec SSDs it's not troubling. The memory isn't even fast memory, it's only 1600MHz stuff.

Ignorance to what happens outside of your (small) server cupboard doesn't mean that it doesn't exist. There are dozens of implementations of fast, multi-core processors that are bottlenecking systems. I work with many.

0
0
Gold badge

Actually, I do have the maximum memory installed for the motherboards in question. Nor am I saying that everyone is the same OMFGWTFBBQ!!!111!!11oneoneone.

I do, however, have this tenancy to pay attention to the world around me, and I have noticed that people who fund the CPU a bottleneck are the exception, not the rule. What's more, of those who do find the CPU the bottleneck, the overwhelming majority of them rewrite their code for a GPU, custom ASIC or otherwise move to non-CPU silicon.

This is the era of custom chip, bub. Big, fat, meaty CPUs are just not needed by the majority...and for the kinds of reasons I stated above.

But hey, get your panties in a bunch because you are incapable of parsing things excepting as absolutes and extremes. You must be a blast at parties.

0
0
Silver badge

8 VDIs

Your VDIs are GPU limited. Congratulations on pouring a ton of money into solving every problem but the one you have.

0
0

Re: @Trevor Pott

Two points.

Hadoop (and many other parallel/hybrid processing frameworks) cannot change the fact that some problems simply aren't very suitable for parallel processing. I believe Intel employs a good number of people who know their way around Hadoop, but they've also noticed not everyone is doing big data in their own garden.

That said, if you've attempted parallel processing on any larger scale, you would notice that getting the system to run efficiently, given a limited memory bandwidth, is a major task and often crucial for deployment on any cloudy distributed platform.

0
0
Gold badge

Re: @Trevor Pott

" if you've attempted parallel processing on any larger scale, you would notice that getting the system to run efficiently, given a limited memory bandwidth, is a major task and often crucial for deployment on any cloudy distributed platform."

Absolutely true, which is exactly why I don't think "fat cores" is the answer. On my HPC-like applications I run into real issues with memory bandwidth on the local node, let alone bandwidth for message-passing between nodes. Now, the new A3Cube PCI-E fabric might help a little on the inter-node stuff, but local to the host? We still need a hell of a lot more memory bandwidth per core.

Even in "standard" virtualisation loads I hit the wall on memory bandwidth. Things like Atlantis ILIO using RAM as a cache for VDI will wreck the memory bandwidth available, leaving those big, meaty cores gasping.

Give me stringy cores with fat RAM pipes any day. All the CPU muscle in the world is worth exactly nothing if I can't feed the damned things. That means RAM, it means storage IOPS and it means network fabric. CPU oomph just doesn't appear on my radar, excepting for the most carefully-tuned (and hence exceptionally rare) applications. There are just too many other bottlenecks that need addressing first.

0
0

Keeping with horse-drawn carriages

While Intel is correct in saying that most software these days is written for single-core, sequential processors, and that it is, indeed, easier to do so, there is little doubt that the future belongs to massive parallelism from many small cores rather than speedy single cores: At a given technology, doubling the clock speed will roughly quadruple power use, because you both need more power to make the transistors switch faster and because every switch costs power. For the same power budget, you can get four cores, which gives you twice the total compute power. It is true that there are inherently sequential problems, but these are fairly rare, and the jobs that require most compute power (cryptoanalysis, data mining, image processing, graphics, ...) are typically easy to make parallel.

Intel's strength is in making fast cores, but they are also power-hungry and expensive. The former because that didn't matter much for desktop PCs and servers up through the 90s and 00s, and the latter because Intel had a de-facto monopoly on x86 processors for PCs and servers, and most 3rd-party software was written exclusively for x86. These days, power matters more: Desktop PCs are more or less a thing of the past (excepting with a few hardcore gamers) and power use is increasingly an issue in data centres. Intel is trying to adapt, but it fears to lose its dominant position before the adaptation is complete. Hence, these bombastic claims.

5
2
Boffin

auto-parrallelise code?

20 years ago, I worked for a company which made image processing boards based around Transputers. You could have as many chips as your wallet could stand, and the code would indeed auto-parrallelise, and scale nicely. If the principles were well-known then, why can't this still be done?

6
1
Silver badge

Re: auto-parrallelise code?

Surely image processing is relatively easy to (auto)paralise - just split up into multiple smaller images. Not all processing parallelises that easily.

5
0
Gold badge
Boffin

Re: auto-parrallelise code?

"20 years ago, I worked for a company which made image processing boards based around Transputers. You could have as many chips as your wallet could stand, and the code would indeed auto-parrallelise, and scale nicely. If the principles were well-known then, why can't this still be done?"

Obvious question. Was it written in Occam?

A language designed for fine grained parallelism (IIRC each statement was conceptually a process, so it looked like a regular computer language but had some subtle features).

2
0
Pint

Was it written in Occam?

Good guess :-) but no, we had a parallel-C compiler (I'm guessing this linked to a similar RTL)

Obviously not suited to every problem, but still, if you could express your algorithm using what must've been some sort of Map/Reduce paradigm, you could do your development on a single TRAM and then run it on the departmental beast when ready.

1
0
Silver badge

Re: auto-parrallelise code?

I know I'm not exactly an "average" developer, but I was working on multi-CPU x86 code in 2002 (on Athlon MP CPUs if it matters).

It's not hard, or at least I didn't find it so, when you are aware of concurrency issues and know how to code parallel tasks and in particular what can be easily run or is appropriate for concurrent processing.

The hardest part was dealing with the utter ball ache that was (and still is in some ways) concurrent access to the Windows GDI, let alone the complete train wreck often involved in running anything ActiveX related concurrently.

The Intel C extensions for parallel code also make it a doddle but, again, you need to know what you are doing. IMHO the historical ghastly native support in Visual Studio for concurrency was a big problem.

As for Intel vs ARM, yes the x86 instruction set sucks balls compared to the ARM instruction set and this requires a lot more (very) clever optimisations from Intel, but even with this aside, it's just depressing how for windows applications, in much of the code 95% of the time nothing productive is being done with the CPU cycles.

1
0
Gold badge
Unhappy

Re: auto-parrallelise code?

"It's not hard, or at least I didn't find it so, when you are aware of concurrency issues and know how to code parallel tasks and in particular what can be easily run or is appropriate for concurrent processing."

I think this depends on how many processors you're talking about. <8 probably fairly easy. Over 8 you're looking at Amdahls law to start biting down.

"The hardest part was dealing with the utter ball ache that was (and still is in some ways) concurrent access to the Windows GDI, let alone the complete train wreck often involved in running anything ActiveX related concurrently."

Not really sure why you'd do this. I'd have guessed Windows expects multiple apps to write to different windows on the screen as they run. I could see the trouble starting when multiple apps on a remote server want in as well.

0
0
Silver badge

Re: auto-parrallelise code?

Not really sure why you'd do this. I'd have guessed Windows expects multiple apps to write to different windows on the screen as they run. I could see the trouble starting when multiple apps on a remote server want in as well

The pain with windows GDI is one app with multiple concurrent threads of execution where it makes sense for them to update the interface independently. In theory it shouldn't be a problem because windows deals pretty well with multiple applications, with varying processor affinities, updating user interfaces simultaneously however as soon as you try to put this all into one application the deficiencies in the GDI start to come through. It's not unexpected of course, as windows was designed as a single user, single processor shell rather than anything more sophisticated and the multi-processor and multi-user was bolted on later as a virtual afterthought.

In case you're wondering why GDI is/was being used, many of the newer windows APIs are sometimes little more than translation or management layers for the underlying GDI layer so not only do you suffer from the hidden GDI problems but you also have another layer of abstraction and inefficiency on top to deal with. The aim was to fix this in WPF however WPF was practically unusable for a long time and brings its own problems to the game.

0
0

This post has been deleted by its author

Bronze badge
Paris Hilton

Shame on intel

Of course most software will be written for x86 ways of doing things - why should it bother intel if the chip-hardware-software ecology blossoms into doing things differently?

0
0
Devil

Intel will be happy to know I have 6 beefy cores all rated at 3.5GHz

But it is an AMD chip

2
1

Re: Intel will be happy to know I have 6 beefy cores all rated at 3.5GHz

Now if only lt had 12 cores or even double the power consumption it would be as fast as that Intel chip

Unless you have a product from the future where AMD is once again competitive

0
0

Re: Intel will be happy to know I have 6 beefy cores all rated at 3.5GHz

Intel will be even happier to note that I have 12 beefy 3.46GHz cores in an old machine, and 20 beefy 2.8GHz cores in the new one. And they're all Intel.

AMD always lose at Billy Big Balls. You just have to pay for them - or find someone else happy to do so for you.

1
0

Power law

Whatever delivers most things per wall watt, wins.

2
1
Silver badge

Re: Power law

Maybe per Watt per dollar? Intel keeps on going on about performance mainly for the reason of cost. Even Atoms and ARMs are converging around performance per Watt, you can still get an ARMful of ARMs for the price of one Atom. That means more memory, networking, etc. or even margin for the system developer.

0
0
Silver badge

Re: Power law

Things per wall watt only matters if things and things per second are sufficient. While often that is the case sometimes its not.

0
2
Silver badge

Re: Power law

I don't normally grumble about downvotes, but on this occasion hope those responsible aren't specifying processors for real-time systems where a response is required in a certain timeframe to prevent something physically crashing/exploding/otherwise killing people. That isn't the time to be deciding a processor which can't deliver results in time but uses less power is appropriate.

0
0
Silver badge

Intel need a good diet

Of course what they don't mention is that the reason that Intel cores are so meaty is that they have to contend a legacy x86 instruction set.

Unfortunately that isn't meat, it's fat.

5
2
Silver badge

These diesel engines will never be as powerful as a steam locomotive! - says steam locomotive designer.

4
1

And that steam engineer would be right for the same reason we don't couple diesel engines to generators for anything greater than local emergency power. When you want to drive an 880MW turbine you use steam. The worlds electricity is powered by it for a reason. Energy per KG@538C is not something to be taken lightly. Steam locomotives had to die more due to combustion efficiency and fuel capacity limitations, and of course that pesky risk of BLEVE [boiling liquid expanding vapour explosion] but not power ;)

4
2
Silver badge

Reminds me of the great Linn Sondek press conference.

When Ivor was questioned by the press as to why the Sondek only had 33rpm he stated dismissively and confidently that 33rpm sounded better than 45rpm.

"In that case why aren't records spinning at 1rpm then?" was the reply.

2
0

Rubbish

Everyone is writing parallel code these days. It has been mainstream for a long time now.

2
4
Silver badge

Re: Rubbish

citations please?

4
0
Silver badge

Re: Rubbish

Can you say Hadoop? (Cloudera,Hortonworks, MapR, Intel, IBM , Pivotal(EMC)) all have distros

Can you say Cassandra?

Can you say Accumulo?

Can you say Mesos?

The list goes on...

2
1

Re: Rubbish

Yes, citations. CAD applications (the most likely to be multithreaded) are barely able to drag their sorry backsides across two cores, Microsoft Office isn't particularly multithreading and the only reason browsing the internet manages to be multithreading is by spreading tabs out across threads and because everything requires a damned plugin.

That stuff just doesn't count.

0
0
Anonymous Coward

Re: Rubbish

Hah, so I guess if you can list a few things that are multithreaded, that means everything is?

Let's say I offer you a choice between a modern dual-core laptop and a slightly cheaper quad-core laptop where each of the cores runs half as fast. Which would you buy? The quad core laptop will take twice as long to render web pages, probably won't be able to play streaming HD Flash/Silverlight video without stuttering, and will get much lower framerates when playing any modern game... but hey, Hadoop jobs will run at the same speed!

1
1
Gold badge
Meh

Wasn't this why people are looking for people with Hadoop skills?

It is meant to be a parallel DB engine is it not?

0
0
Anonymous Coward

@John Smith ... Re: Wasn't this why people are looking for people with Hadoop skills?

Hadoop is not a database. Its a parallel processing framework where a NoSQL database could be part of it. (Accumulo and HBase are two examples that sit on the cluster. ) On MapR, you can add Vertica to that list.

While the Intel talking head is a prat, there are advantages to using E7 chips which are multi-core.

Looking at emerging technologies. Pairing E7s and SanDisk's Ultradimm among other tech means that you can do more with an overall less consumption of power in a smaller footprint.

As long as you can maintain a ratio of 1 disk per virtual core (for intel thats a 2x ratio to physical cores), you will not be I/O bound. Of course you can use the extra cores to virtualize the box too. ;-)

Posted Anon for the obvious reasons.

0
0
Gold badge

Obvious troll is obvious

"“The world has a big issue around vectorisation and parallelisation of code,” Graylish said. “99% of code isn't written that way.” Graylish also feels “defining a workload that can run in 1000 cores is hard.”"

A 1000-core chip is 1.5 orders of magnitude more than anything actually on sale to mainstream customers. The fact that software doesn't currently target such a beast tells you nothing about what programmers might do if they could lay their hands on one. Right now, programmers know that dividing their logic into several hundred separate strands will provide zero benefit, possibly less. It would be rather odd if anything actually did it.

Meanwhile, in the rather small but costly world of Google, Amazon and the like, we *do* find embarrassingly parallel workloads looking for hardware that maximises performance per watt.

2
1

Re: Obvious troll is obvious

“The world has a big issue around vectorisation and parallelisation of code,” Graylish said. “99% of code isn't written that way.” Graylish also feels “defining a workload that can run in 1000 cores is hard.”

With the greatest respect to Intel and Graylish, this conflates processes and threads. Since it's coming from Intel, whose business it is to know this stuff, I can only assume it's a deliberate obfuscation.

0
0
Bronze badge

Re: Obvious troll is obvious

I think Intel are trying to address the proposed ARM servers where a lot of separate CPU's are bundled together (i.e. Calxeda and AMD/SeaMicro). VM's already provide an easy way to utilise this setup and the large data centre operators know how to manage large node count environments.

SeaMicro are in a particularly interesting position as they have a product that scales and supports x86 and ARM on a cheap interconnect.

The real question about ARM servers is whether providing more performance hurts their power consumption significantly. A big chunk of x86 power consumption is down to cache and IO - most ARM cores reduce both to keep power down and nether is hard for Intel to reproduce if required.

0
0

way way way way more cores please

I personally think that things will start to get interesting when you have a million cores on a piece of silicon. Clocked at a low enough frequency of about 2 MHz. A bit of RAM each, a few registers, a not so complex ALU, maybe even integer only.

If each core could simulate a neuron this would be comparable to a human brain's 85,000,000,000 neurons clocked, allowing for the speed of a signal pulsed about the brain, at about 24Hz. OK you are missing all the interconnections between the neurons, But I still think that there would be interesting things to see. Me I don't care if it is Intel or ARM who get there first, I just want to be around to see what happens next.

2
0
Silver badge

Why does this statement give me the image

of a naked Intel legs akimbo sitting backwards on a chair?

0
0

Two models of computing

This is not about numbers of cores or architecture per se, but is about basic physics. There is a direct relationship between the work done by a software process and the electrical power required. There is also a direct relationship between the word done and the heat generated. That means that heavy processing tasks will always run better on processors that are not designed with power consumption efficiency in mind. The bottleneck is how much entropy the system can manage. You wouldn't dream of trying to dissipate anywhere near as much heat through a tablet's case as you would through an MT form factor CAD machine's case.

A big give-away is the battery life of smartphones. Very few mobile devices last a full day of medium-level use without needing a bit of a charge-up. It would be quite easy to put a bigger battery in smartphones like the new Nexus 5 so that they ran either faster for the same time or for longer. Why don't they do this? Because a) users of smartphones don't expect a full day's charge and b) the limit to processing power in smartphones is the heat generated, not the processing power available. I predict that we are not far off the ceiling for mobile device processing power until we develop much cooler ways of carrying out computer calculations.

0
1
This topic is closed for new posts.