back to article Nvidia's new CUDA 6 has the 'most significant new functionality in the history of CUDA'

Nvidia has released CUDA 6, an upgrade to its proprietary GPU programming language that it says "includes some of the most significant new functionality in the history of CUDA." For our money, the most important aspect of CUDA 6 is its unified memory scheme, which The Reg described in some detail when the CUDA Toolkit 6.0 was …

COMMENTS

This topic is closed for new posts.
  1. asdf

    Hmm

    I am sure some expert on here will correct me, but personally when I hear a system is abstracting memory allocation I get concerned especially if doing HPC code (again only dabbled in that world). Explicitly requiring a copy at least makes the developer aware he is about to request a potentially expensive operation. I could see though how it would improve code readability but in the HPC world that usually is not always the highest priority compared to performance correct?

    1. asdf

      Re: Hmm

      Its also been my experience in general when a feature is introduced to a language that can be abused its almost guaranteed it will be so and it will be in code I have to maintain.

    2. asdf

      Re: Hmm

      Looks like I just had to RTM (links) to find the answer

      "An important point is that a carefully tuned CUDA program that uses streams and cudaMemcpyAsync to efficiently overlap execution with data transfers may very well perform better than a CUDA program that only uses Unified Memory. Understandably so: the CUDA runtime never has as much information as the programmer does about where data is needed and when! CUDA programmers still have access to explicit device memory allocation and asynchronous memory copies to optimize data management and CPU-GPU concurrency. Unified Memory is first and foremost a productivity feature that provides a smoother on-ramp to parallel computing, without taking away any of CUDA’s features for power users."

      Just what I thought. This convenient great new feature is going to allow hacks who don't really understand parallel programming to think they do and possibly negate much of the benefits of computing with CUDA on GPUs. Think .Net for CUDA. Managed language features have no business in production HPC environments IMHO.

      1. Anonymous Coward
        Anonymous Coward

        Re: Hmm

        I can't address "modern hybrid HPC," but I know that 6.0 will help on the software engineering end as I'll be using the coming Tegrs K1 for sensors processing. I've have a ton of experience twiddling bits and for that unified memory is actually a plus.

      2. Michael H.F. Wilkinson Silver badge

        Re: Hmm

        In my experience hiding memory complexity is a mixed blessing. To gain maximum performance I need to understand the architecture the code runs on, to avoid costly operations, as others have said. I will get my code up an running sooner, but it will not necessarily be as fast as hand-tuned code. Where I do see use is in getting more-or-less machine independent code up and running quickly. If it does not run fast enough, you can then tune it to get the most out of the hardware and reduce costly copying from one part of memory to another, and so avoiding bandwidth and latency problems.

        Nothing beats better bandwidth and lower latency, of course, but that is something the hardware guys must do for us software guys (and I know they are working on it). If the GPU and CPU truly share memory (i.e. the memory is physically unified), many difficulties will drop away, but that is something to dream about, for now

        1. ThomH

          Re: Hmm

          Could Unified Memory not be a step intended to buy Nvidia better heterogeneous computing options in the future, especially on smaller systems? I'm thinking of things like smartphones where you've got GPU + CPU + RAM in a single module, with the memory actually physically addressable equally by the different components. It'd be good to have naturally parallel things scale automatically between appropriate cores on those, wouldn't it? GPGPU isn't going to be exclusively for the HPC niche forever.

          1. asdf

            Re: Hmm

            >It'd be good to have naturally parallel things scale automatically between appropriate cores on those, wouldn't it? GPGPU isn't going to be exclusively for the HPC niche forever.

            Great point. Yeah that makes sense.

        2. gizmo23

          Re: Hmm

          Ryan Smith went into this (http://www.anandtech.com/show/7515/nvidia-announces-cuda-6-unified-memory-for-cuda) and points out that the performance overheads are currently undefined. I think the idea is that when Nvidia introduce real unified memory on some product down the line, code written in CUDA 6 won't need any re-write. In the mean time, you're right, it hides the architecture behind an abstraction which has an (as yet) unknown performance cost which is not beneficial for tuned HPC code.

      3. NFA_2_DFA_0FFF

        Re: Hmm

        Why would it "possibly negate much of the benefits of computing with CUDA on GPUs."

        This smells like the same argument that I heard the old farts at work saying when we moved from assembler to C many years ago......"These hacks who don't really understand how the registers are used will suddenly think they understand programming, and I'm going to have clean up their crap code!"

        Which is exactly the kind of rubbish I hear C++ developers saying when they talk about .NET.

        Different tools/approaches for different needs otherwise hammers/crow-bars/screwdrivers would become illegal because they've been used in many a murder case!

        1. asdf

          Re: Hmm

          >Which is exactly the kind of rubbish I hear C++ developers saying when they talk about .NET.

          .Net was so great that Microsoft used it for all their own products huh (the one time they used WPF for example in VS10 it was an epic buggy sluggish fail)? Its so great for system programming they built Win8 and Metro on it as well huh (hint: no they abandoned it) ? .Net is ok for the kind of things VB6 was being used for in the past but has nowhere near the utility in different situations on different platforms as standard C/C++.

  2. Eddy Ito

    First, how does this compare with AMD's hUMA? Is it the same thing? If it is, will they play nice together?

    Second, cluster pool! When will the first suitcase cluster be built with these boards? Six node minimum to qualify. I'll take 3 days after Mother's Day so 14th May.

  3. Fizzle
    Headmaster

    English?

    Is everyone writing English here or is this another new hybrid language?

    Confoosed (not difficult)

    Whacko, 'cos I need a good caning.

    1. Anonymous Coward
      Anonymous Coward

      Re: English?

      I am afraid parallel computing has a language of its own, a bit like quantum physics is to standard physics.

      All the rules change, what you thought you know has to be discarded and relearned and a new set of terminology that overlaps but does not carry the same meaning.

      Simplifying this is a bit like asking to simplify a foreign (spoken) language...

    2. John Tappin

      Re: English?

      However,

      I think asdf is saying that where you need performance you need to keep your data in the fast on-board GPU memory rather than mainboard and using a system that makes it all look like one big memory pool will mean you load the slower mainboard memory more than a tuned application that has synced its data into the GPU space ready for processing.

      Updated to add analogy

      Bit like single level disk storage presented thus hiding all your expensive and highly architected performance tiers behind a generic storage management interface that just dumps data everywhere.

      1. asdf

        Re: English?

        >I think asdf is saying that where you need performance you need to keep your data in the fast on-board GPU memory rather than mainboard and using a system that makes it all look like one big memory pool will mean you load the slower mainboard memory more than a tuned application that has synced its data into the GPU space ready for processing.

        Basically yes. If a developer is not aware of the performance characteristics of the different memory and buses (architecture in general) they may sloppily end up throwing data around needlessly negating much of the massive parallelism available as the system is spending a lot of time waiting for data to become available.

    3. deadlockvictim

      Re: English?

      Now you know what your poor parents feel like when you speak to them in computer. All the phrases like 'just clear the cache', 'it's those bloody cookies, you know' and 'you need more memory' left them simply agreeing with you while not having a bloody clue what you were on about.

      And as for your daughter crying because some nasty girls at school wrote horrible things on her wall and then defriended her, well, I just give up.

  4. Anonymous Coward
    Anonymous Coward

    CUDA playing catch up

    The allocation abstraction will actually benefit performance on hybrid (CPU / GPU) systems since main / device memory are shared between the two processors groups. In those situations there is no need for the copy. Looking at the hybrid chip systems (Tablets / Mobile Phones) this would be of huge benefit here.

    Another poster is correct in regards to super-computer or other "big" systems with separate memories for GPU / CPU in regards to hand optimization. My impression (as an outsider to this field) is that CUDA primarily dominates / is used in scientific applications which execute on big systems rather than mobile devices. Lay coders would primarily be concerned with OpenCL which can run on everything (not just Nvidia GPUs) if they where optimizing their C code.

    That said the new generation of C++ GPU programming APIs (C++ AMP, etc) actually abstract much of the problem of explicit memory management by providing C++ wrapper classes over the memory arrays. I think the NVidia Thrust API (which is a C++ wrapper over CUDA) does something similar - although I have no experience with that API.

    My opinion - having used the C++ GPU programming APIs on some toy projects - is that one should be getting out of the business (as a developer) of explicitly memory copies. The C++ level abstractions are much more elegant to work with and (in the long term) provide better opportunities to easily optimize applications based on the hardware the program is running on.

  5. Paul Shirley

    heading off AMD?

    Feels more like they're laying the groundwork for a move to a physical unified memory architecture before amd can make any gains. Or should that be "any more gains", amd already grabbed the ps4 and xb1 business with a unified memory design.

    1. phil dude
      Linux

      Re: heading off AMD?

      yes, but their software support has not exactly set the (scientific computing) world on fire...

      P.

      1. Paul Shirley

        Re: heading off AMD?

        The announcement further up the page of heterogeneous Opterons with unified memory might be what's spooked Nvidia and a sign AMD are about to get much more serious...

This topic is closed for new posts.

Other stories you might like