Intel and NVIDIA are battling for the hearts and minds of developers in massively parallel computing. Intel has been saying for years that concurrency rather than clock speed is the future of high performance computing, yet it has been slow to provide the mass of low-power, high-efficiency CPU cores needed to take full …

COMMENTS

House rules Send corrections

This topic is closed for new posts.

Monday 21st May 2012 14:08 GMT Michael H.F. Wilkinson

Parallel code easily transfered to very different architecture?

Let me guess, they can easily parallelize adding two arrays together, or doing matrix vector stuff optimally. This covers some very important bases, but some parallel code needs to be rethought rather than just recompiled when porting to a very different architecture.

We have code which does not use matrix-vector stuff, and works best (40x speed up on 64 cores) on fairly coarse grained, shared memory, parallel architectures. We still have not managed to make a distributed memory version (working on it), and are struggling with an OpenCL version for GPUs (working on it with GPU gurus).

Every time I have heard people claim to have tools that take all the hard work out of parallel programming, they show me examples like "add these 10^9 numbers to another bunch of 10^9 numbers". These tools can indeed take a lot of the hard work out of parallel computing, but not all, by quite a long way.

9 0
1. Monday 21st May 2012 17:18 GMT asdf
  
  Re: Parallel code easily transfered to very different architecture?
  
  And of course some basic algorithms naturally do not lend themselves to being very parallel. Not to mention for many general purpose programs taking advantage of the massive parallelism requires a massive increase in programming effort and testing which does not always translate into much better real world performance. The ultimate example of this is the Cell BE in the PS3 vs the more standard multicore in the Xbox 360.
  
  2 0
  1. Monday 21st May 2012 19:03 GMT Randy Hudson
    
    Cell fail?
    
    The top 4 of the green 500 list are occupied by derivatives of the Cell BE.
    
    0 0
    1. Monday 21st May 2012 20:11 GMT asdf
      
      Re: Cell fail?
      
      Only because IBM of course is going to chose Cell over other options GPU/FPGAs etc and IBM build most of the top supercomputers. If it was so demanded by the market why has IBM decided to quit upgrading the platform.
      
      0 0
      1. Monday 21st May 2012 22:25 GMT dlc.usa
        
        Re: Cell fail?
        
        I thought IBM incorporated the needed Cell capabilities into their own processor architectures a while back.
        
        0 0
    2. Tuesday 22nd May 2012 03:57 GMT Kiralexi
      
      Re: Cell fail?
      
      Those are PPC, but not Cell-based. No SPE's involved.
      
      0 0
    3. Tuesday 22nd May 2012 08:21 GMT Roo
      
      Re: Cell fail?
      
      IIRC BlueGene predates Cell.
      
      BlueGene L/P 's processors are embedded cores with chunky FP units attached to them. The aim was to balance memory bandwidth with compute power, something that Cell failed to do. Perhaps if they had use BG cores they might have been able to get away without adding a GPU to the Playstation. Columbia's QCDOC is closer in design to BlueGene/L/P than a network of PS3s... :)
      
      The last time I checked the most efficient machine in *delivered* FLOPS/W was a Chinese box based on a derivative of the venerable Alpha, no GPUs there. AFAIK that Chinese box was a one-off, whereas you can actually buy BlueGenes and they are very nearly as good. Again no GPUs there either.
      
      Looks like NVidia are doing something about GPU IO so maybe the equation will change again. Personally I suspect it won't change much.
      
      0 0
2. Tuesday 22nd May 2012 08:54 GMT Pperson
  
  Re: Parallel code easily transfered to very different architecture?
  
  Right on Michael! I do get tired of hearing/reading these people (and to some extent the Reg itself) talk as if parallel computing is GREAT when in fact it's just forced on us by the dead-ending of CPU speeds. Then there is the insinuation that it's the coders' fault for not being able to take advantage of it - because of course everything boils down to a matrix operation, right? Never mind problems due to parallel memory access bottlenecks, de-synching, etc etc. But you only find this stuff out the hard way, it's like a dirty secret of the industry.
  
  Hey Reg, how about an article on the *other* side of the parallel coin? You know, the one that says "honestly, this parallel thing isn't nearly what they hype it to be".
  
  0 1
  1. Tuesday 22nd May 2012 14:59 GMT Michael H.F. Wilkinson
    
    @Pperson
    
    I have to disagree a bit here. Parallel computing is great, but at the same time it is hard work, and it is only useful in particular, data and compute intensive tasks. Memory access bottlenecks have been reduced greatly by getting rid of the front-side bus (guess why Opterons are so popular in HPC), but they are still very much present in GPUs, in particular in communication between GPU and main memory. There are improvements in tooling, but they are too often over-hyped. Besides, as with all optimization, you need understanding of the hardware.
    
    Parallel computing is at the forefront of computer science research, and new (wait-free) algorithms are being published in scientific journals, as are improvements in compilers, languages and other tools.
    
    Throughout its early history, physics simulation with its emphasis on matrix-vector work dominated the HPC field. Now a much larger variety of code is being parallelized. People are finding out the hard way that parallel algorithm design is a lot harder than sequential programming.
    
    As I like to tell my students: parallel computing provides much faster ways of making your program crash.
    
    0 0
Monday 21st May 2012 14:42 GMT localzuk

'Battling'...

You mean 'waiting until the right moment' and then Intel will buy them?

0 0
1. Tuesday 22nd May 2012 02:16 GMT P. Lee
  
  Re: 'Battling'...
  
  > You mean 'waiting until the right moment' and then Intel will buy them?
  
  Just before the market realises 64bit ARM is arriving...
  
  1 0
Tuesday 22nd May 2012 08:32 GMT Torben Mogensen

New languages

The idea of using existing languages (in particular C and C++) for massively parallel computing is doomed: These languages are inherently sequential and rely on a flat shared memory, which is very far from what massively parallel machines look like. Sure, you can use libraries called from C or C++, and you can even program these libraries in something that superficially resembles C, but the fact is that C and related languages are hopelessly inadequate for the task.

So we need languages that move away from languages with implicit sequential dependencies through updates to a shared state towards languages that do not have shared state and where the only sequential dependencies are producer-consumer dependencies. This means that you don't have traditional for-loops, as these over-specify a sequence on iterations of the loops. Instead, you have for-all constructs that allows the "iterations" to be done in any order or even at the same time. And to replace a for loop that, say, adds up elements in an array or other collection, you have "reduce" constructs that do this in parallel.

You might think of map-reduce, but it goes further than that. The proper reference is NESL.

0 0
1. Tuesday 22nd May 2012 15:07 GMT Michael H.F. Wilkinson
  
  Re: New languages
  
  I agree up to a point. Languages do need to change, and they are in fact changing. OpemMP is a sort of "bolt-on" solution for C(++) which allows the compiler to treat for loops as for-all statements, and provides various other mechanisms for syncing. A functional approach such as in Erlang is often proposed. I do have some doubts that we can solve all sorts of problems merely with new languages. We need to learn new ways of thinking about these problems. A good language can inspire new ways of thinking, of course.
  
  0 0
Tuesday 22nd May 2012 09:52 GMT Adam Nealis

Not much will change.

In the grand scheme of things.

The first parallel programming language I can remember is/ was OCCAM, introduced in 1983. That's nearly 40 years for parallel programming to take over and it hasn't.

I am not aware of any cross compiler that is generic enough to optimise for an arbitrary architecture. So to optimise, people are needed. People are the most expensive part of the bangs to the buck equation. There's a fine line between parallelise and paralyse.

Most software is "optimised" by upgrading the hardware it runs on.

Intel will win market share with MIC because it is a big company, because of the ease of porting code to the class of performance problems that can be addressed in this way without expensive rewrites but deliver nice performance multiples.

It will be interesting to see how Nvidia penetrate the various scales of the supercomputing market. But how big is that market in terms of shipped units? But to the investor in relatively small Nvidia, it might be higher earnings per share than MIC for Intel.

0 0

This topic is closed for new posts.

Topics

Special Features

Vendor Voice

Resources

User topics

Article topics

User topics

Article topics

Core Wars: Inside Intel's power struggle with NVIDIA

COMMENTS

Parallel code easily transfered to very different architecture?

Re: Parallel code easily transfered to very different architecture?

Cell fail?

Re: Cell fail?

Re: Cell fail?

Re: Cell fail?

Re: Cell fail?

Re: Parallel code easily transfered to very different architecture?

@Pperson

'Battling'...

Re: 'Battling'...

New languages

Re: New languages

Not much will change.

Other stories you might like

Intel Gaudi's third and final hurrah is an AI accelerator built to best Nvidia's H100

Intel's effort to build a foundry biz is costing far more – and taking longer – than expected

AI cloud startup TensorWave bets AMD can beat Nvidia

Intel over the Moon as Lunar Lake’s NPU performance TOPS Meteor Lake

Meteor Lake CPUs splash down in socketed motherboards for edge and embedded workloads

Next Vision, or Vision Next? What we really thought about Google and Intel's AI events

Kaby Lake-G chip back from the grave, now on modest firewall-router-NAS mobo

Intel's neuromorphic 'owl brain' swoops into Sandia labs

Los Alamos Lab powers up Nvidia-laden Venado supercomputer

China scientists talk of powering hypersonic weapon with cheap Nvidia chip

US lawmakers rage over Intel Meteor Lake-powered Huawei PC

Intel preps export-friendly lower-power Gaudi 3 AI chips for China

About Us

Our Websites

Your Privacy