back to article Core Wars: Inside Intel's power struggle with NVIDIA

Intel and NVIDIA are battling for the hearts and minds of developers in massively parallel computing. Intel has been saying for years that concurrency rather than clock speed is the future of high performance computing, yet it has been slow to provide the mass of low-power, high-efficiency CPU cores needed to take full …

COMMENTS

This topic is closed for new posts.
  1. Michael H.F. Wilkinson Silver badge
    Boffin

    Parallel code easily transfered to very different architecture?

    Let me guess, they can easily parallelize adding two arrays together, or doing matrix vector stuff optimally. This covers some very important bases, but some parallel code needs to be rethought rather than just recompiled when porting to a very different architecture.

    We have code which does not use matrix-vector stuff, and works best (40x speed up on 64 cores) on fairly coarse grained, shared memory, parallel architectures. We still have not managed to make a distributed memory version (working on it), and are struggling with an OpenCL version for GPUs (working on it with GPU gurus).

    Every time I have heard people claim to have tools that take all the hard work out of parallel programming, they show me examples like "add these 10^9 numbers to another bunch of 10^9 numbers". These tools can indeed take a lot of the hard work out of parallel computing, but not all, by quite a long way.

    1. asdf

      Re: Parallel code easily transfered to very different architecture?

      And of course some basic algorithms naturally do not lend themselves to being very parallel. Not to mention for many general purpose programs taking advantage of the massive parallelism requires a massive increase in programming effort and testing which does not always translate into much better real world performance. The ultimate example of this is the Cell BE in the PS3 vs the more standard multicore in the Xbox 360.

      1. Randy Hudson

        Cell fail?

        The top 4 of the green 500 list are occupied by derivatives of the Cell BE.

        1. asdf
          FAIL

          Re: Cell fail?

          Only because IBM of course is going to chose Cell over other options GPU/FPGAs etc and IBM build most of the top supercomputers. If it was so demanded by the market why has IBM decided to quit upgrading the platform.

          1. dlc.usa

            Re: Cell fail?

            I thought IBM incorporated the needed Cell capabilities into their own processor architectures a while back.

        2. Kiralexi

          Re: Cell fail?

          Those are PPC, but not Cell-based. No SPE's involved.

        3. Roo
          Boffin

          Re: Cell fail?

          IIRC BlueGene predates Cell.

          BlueGene L/P 's processors are embedded cores with chunky FP units attached to them. The aim was to balance memory bandwidth with compute power, something that Cell failed to do. Perhaps if they had use BG cores they might have been able to get away without adding a GPU to the Playstation. Columbia's QCDOC is closer in design to BlueGene/L/P than a network of PS3s... :)

          The last time I checked the most efficient machine in *delivered* FLOPS/W was a Chinese box based on a derivative of the venerable Alpha, no GPUs there. AFAIK that Chinese box was a one-off, whereas you can actually buy BlueGenes and they are very nearly as good. Again no GPUs there either.

          Looks like NVidia are doing something about GPU IO so maybe the equation will change again. Personally I suspect it won't change much.

    2. Pperson

      Re: Parallel code easily transfered to very different architecture?

      Right on Michael! I do get tired of hearing/reading these people (and to some extent the Reg itself) talk as if parallel computing is GREAT when in fact it's just forced on us by the dead-ending of CPU speeds. Then there is the insinuation that it's the coders' fault for not being able to take advantage of it - because of course everything boils down to a matrix operation, right? Never mind problems due to parallel memory access bottlenecks, de-synching, etc etc. But you only find this stuff out the hard way, it's like a dirty secret of the industry.

      Hey Reg, how about an article on the *other* side of the parallel coin? You know, the one that says "honestly, this parallel thing isn't nearly what they hype it to be".

      1. Michael H.F. Wilkinson Silver badge
        Boffin

        @Pperson

        I have to disagree a bit here. Parallel computing is great, but at the same time it is hard work, and it is only useful in particular, data and compute intensive tasks. Memory access bottlenecks have been reduced greatly by getting rid of the front-side bus (guess why Opterons are so popular in HPC), but they are still very much present in GPUs, in particular in communication between GPU and main memory. There are improvements in tooling, but they are too often over-hyped. Besides, as with all optimization, you need understanding of the hardware.

        Parallel computing is at the forefront of computer science research, and new (wait-free) algorithms are being published in scientific journals, as are improvements in compilers, languages and other tools.

        Throughout its early history, physics simulation with its emphasis on matrix-vector work dominated the HPC field. Now a much larger variety of code is being parallelized. People are finding out the hard way that parallel algorithm design is a lot harder than sequential programming.

        As I like to tell my students: parallel computing provides much faster ways of making your program crash.

  2. localzuk Silver badge

    'Battling'...

    You mean 'waiting until the right moment' and then Intel will buy them?

    1. P. Lee

      Re: 'Battling'...

      > You mean 'waiting until the right moment' and then Intel will buy them?

      Just before the market realises 64bit ARM is arriving...

  3. Torben Mogensen

    New languages

    The idea of using existing languages (in particular C and C++) for massively parallel computing is doomed: These languages are inherently sequential and rely on a flat shared memory, which is very far from what massively parallel machines look like. Sure, you can use libraries called from C or C++, and you can even program these libraries in something that superficially resembles C, but the fact is that C and related languages are hopelessly inadequate for the task.

    So we need languages that move away from languages with implicit sequential dependencies through updates to a shared state towards languages that do not have shared state and where the only sequential dependencies are producer-consumer dependencies. This means that you don't have traditional for-loops, as these over-specify a sequence on iterations of the loops. Instead, you have for-all constructs that allows the "iterations" to be done in any order or even at the same time. And to replace a for loop that, say, adds up elements in an array or other collection, you have "reduce" constructs that do this in parallel.

    You might think of map-reduce, but it goes further than that. The proper reference is NESL.

    1. Michael H.F. Wilkinson Silver badge

      Re: New languages

      I agree up to a point. Languages do need to change, and they are in fact changing. OpemMP is a sort of "bolt-on" solution for C(++) which allows the compiler to treat for loops as for-all statements, and provides various other mechanisms for syncing. A functional approach such as in Erlang is often proposed. I do have some doubts that we can solve all sorts of problems merely with new languages. We need to learn new ways of thinking about these problems. A good language can inspire new ways of thinking, of course.

  4. Adam Nealis

    Not much will change.

    In the grand scheme of things.

    The first parallel programming language I can remember is/ was OCCAM, introduced in 1983. That's nearly 40 years for parallel programming to take over and it hasn't.

    I am not aware of any cross compiler that is generic enough to optimise for an arbitrary architecture. So to optimise, people are needed. People are the most expensive part of the bangs to the buck equation. There's a fine line between parallelise and paralyse.

    Most software is "optimised" by upgrading the hardware it runs on.

    Intel will win market share with MIC because it is a big company, because of the ease of porting code to the class of performance problems that can be addressed in this way without expensive rewrites but deliver nice performance multiples.

    It will be interesting to see how Nvidia penetrate the various scales of the supercomputing market. But how big is that market in terms of shipped units? But to the investor in relatively small Nvidia, it might be higher earnings per share than MIC for Intel.

This topic is closed for new posts.

Other stories you might like