User topics

Article topics

Log in Sign up

AMD reveals potent parallel processing breakthrough

AMD has released details on its implementation of The Next Big Thing in processor evolution, and in the process has unleashed the TNBT of acronyms: the AMD APU (CPU+GPU) HSA hUMA. Before your eyes glaze over and you click away from this page, know that if this scheme is widely adopted, it could be of great benefit to both …

COMMENTS

House rules Send corrections

This topic is closed for new posts.

Wednesday 1st May 2013 02:02 GMT Anonymous Coward

This is pleasing

More good news technology stories like this and I shall renew my subscription.

10 0
1. Wednesday 1st May 2013 08:38 GMT Fred Flintstone
  
  Re: This is pleasing
  
  Be warned, you need a sense of hUMA.
  
  No, the *dirty* Mac, thanks.
  
  7 0
Wednesday 1st May 2013 02:50 GMT James Wheeler

About time

This is going to make GPU coprocessing useful in many instances where it isn't now.

2 0
Wednesday 1st May 2013 03:51 GMT ARP2

No more worrying about Graphic card memory

If I understand correctly, if they were use this architecture on a "traditional" PC, do we need to worry about memory on a graphics card anymore? Or would the CPU/GPU simply split the use of my RAM as they need it? Could that also mean that more RAM= much better graphics performance?

0 1
1. Wednesday 1st May 2013 08:15 GMT DF118
  
  Re: No more worrying about Graphic card memory
  
  I think the point of this is in situations where the CPU and GPU reside on the same die, not where you have a discrete GPU elsewhere on the assembly or even on a daughter board, which is I think what you're talking about.
  
  2 0
  1. Wednesday 1st May 2013 21:19 GMT JEDIDIAH
    
    Re: No more worrying about Graphic card memory
    
    A co-processor is a co-processor. If it can act in place of the CPU with less nonsense then that's useful regardless of whether or not the co-processor is on the same die. This is just turning a GPU into a fancier math co-processor.
    
    Surprised it hasn't been done yet actually.
    
    4 0
2. Wednesday 1st May 2013 12:31 GMT Matt Bryant
  
  Re: No more worrying about Graphic card memory
  
  "....if they were use this architecture on a "traditional" PC, do we need to worry about memory on a graphics card anymore?...." The article specificly mentions tablets and handsets, which implies this is more a better "system on-a-chip" than a replacement for traditional gaming PC architecture.
  
  One reason it is unlikley to upset the PC applecart is because plug-in graphics card vendors like having complete control over the discrete memory on their cards, they don't have to wait for CPU or motherboard designers to catch up. If a new memory type that works best for graphics comes out they don't have to wait for the CPU manufacturers to redesign the memory controllers on their CPUs or the mobo designers to issue a new mobo with new RAM slots, they simply add it to their own cards. That is the advantage of discrete graphics memory in PCs. If you had no memory on your graphics card and had to go out over the bus to main memory then performance would suck. And graphics card makers will want to use the latest and greatest as they will want to maintain a perfromance lead over combined designs like this or they will go out of business.
  
  I would suggest this is more aimed at tablets and possibly virtual desktop environments, the latter seeing greater efficeincies in memory if they can pool it for all tasks.
  
  0 0
  1. Wednesday 1st May 2013 19:23 GMT Craig 2
    
    Re: No more worrying about Graphic card memory
    
    What woule be nice for gaming with a discrete 3D graphics card is if the new on-processor APU could be re-tasked to help with something like physics or AI.
    
    0 0
    1. Friday 3rd May 2013 16:30 GMT Palf
      
      Re: No more worrying about Graphic card memory
      
      Yes - also it's unclear what degree of control the programmer has over CPU/GPU usage decisions. It would be nice to specify that compute-intensive inner loops be executed mandatorily on the GPU, for example. I get the impression that the GPU is auto-assigned only when the CPU runs out of puff, but that's just a SWAG.
      
      0 0
3. Thursday 2nd May 2013 05:00 GMT ThomH
  
  Re: No more worrying about Graphic card memory
  
  By putting the GPU behind the MMU it does technically reduce one of the video memory concerns — you could have a single graphic however many gigabytes in size, memory map the file and call that the texture. Attempts by the GPU to read sections not currently paged would simply raise the usual exception, which would be caught by the OS in the usual way and handled by the existing paging mechanisms. You no longer have to treat texture caching as a separate application-level task.
  
  That said, as others have noted the main point of the design is that when you write a parallel for loop in your language of choice to perform some vector operation — especially if it involves no branching — the GPUs can be factored into the workload just as easily as any traditional CPU cores, but so as to perform the work much more efficiently. So writing programs that take advantage of all the available processing becomes a lot easier. Collapsing virtual memory to a single mechanism that your OS vendor has already supplied is just one example.
  
  0 0
Wednesday 1st May 2013 04:14 GMT Anonymous Coward

What was UMA architecture then?

I am not an expert in this field, but I use to hear this distinction Unified Memory Architectures (UMA) and Non-Unified Memory Architectures (NUMA) when it came to GPUs. So how is the present offering from AMD different than UMA?

0 0
1. Wednesday 1st May 2013 05:42 GMT bazza
  
  Re: What was UMA architecture then?
  
  In their current APUs the GPU doesn't interact with memory in the same way as the CPU does. That's in spite of the fact that they're on the same die and ultimately share the same DDR3 memory bus. In that's sense the arrangements are slightly Non Uniform, and you have to copy data in order to get it from one realm to another.
  
  This new idea means that the GPU and CPU interact with memory in exactly the same way, and that makes a big difference. Software is simpler because a pointer in a program in the CPU doesn't need to be converted for the GPU to be able to use it. That helps developers. More importantly the "GPU job setup time" is effectively zero because no data has to be copied in or out first. That speeds up the overall job time.
  
  I like it!
  
  4 0
  1. Wednesday 1st May 2013 19:49 GMT Anonymous Coward
    
    Re: What was UMA architecture then?
    
    But aren't you actually describing NUMA v/s UMA rather than UMA v/s hUMA?
    
    0 0
  2. Thursday 2nd May 2013 04:25 GMT @Yehuda Ben Yehudin.
    
    Re: What was UMA architecture then?
    
    Isn't this exactly what the Cell architecture had 10 years ago, a single system memory that is accessed by the SPE accelerators and the CPUs? What's the news here?
    
    Plus ca change...
    
    1 0
2. Wednesday 1st May 2013 06:13 GMT Flocke Kroes
  
  Re: What was UMA architecture then?
  
  The key difference is not on the diagram. When a process on a CPU tries to access some memory, the address that the process selects is a virtual address (back then: a 32-bit number, now often a 64-bit number). The CPU tries to convert the virtual address into a physical address (a different number, sometimes a different size). There are several uses for this rather expensive conversion:
  
  Each process gets its own mapping from virtual to physical addresses - this makes it very difficult for one process to scribble all over the memory that belongs to a different process.
  
  The total amount of virtual memory can exceed the amount of physical memory. (Some virtual addresses get marked as a problem. When a process tries to access such a virtual address, the CPU signals this as a problem to the operating system. The operating system suspends the process, assigns a physical address for the virtual address, gets the required data from disk into that physical memory then restarts the process.)
  
  Sometimes it is just convenient - the mmap function makes a file on a disk look like some memory. If a process tries to read some of the mapped memory, the operating system ensures data from the file is there before the read instruction completes. If a process modifies the contents of mapped memory, the operating system ensures the changes occur to the file on the disk.
  
  In UMA, the CPU and the GPU access the same physical memory, but the GPU only understands physical addresses. When a process wants some work done by the GPU, it must ask the operating system to convert all the virtual addresses to physical addresses. This can go badly wrong because a neat block of virtual addresses could get mapped to a bunch of physical addresses scattered all over the memory map. Worse still, some of the virtual memory could map to files on a disk and not have a physical address at all. The two solutions are to have the operating system copy the scattered data into a neat block of contiguous physical addresses or for the process on the CPU to anticipate the problem and request that some virtual addresses map to a neat contiguous block of physical addresses before creating the data to go there.
  
  Plan B looks really good until you spot that the operating system might not have such a large block of physical memory unassigned. It would have to create one by suspending the processes that use a block of memory, copying the contents elsewhere, updating the virtual to physical maps and then resuming to suspended processes. It gets worse. That huge block of memory cannot be paged out if it is not being used, and the required contents might already be somewhere else in memory so it will have to be copied into place instead of being mapped.
  
  All this hassle could be avoided if the GPU understood virtual addresses. That would cut down on the expensive copying (memory bandwidth limits the speed of many graphics intensive tasks). The down side is it adds to the burden of the address translation hardware which is already does a huge and complicated task so fast that many programmers do not even know it is there.
  
  18 1
  1. Wednesday 1st May 2013 08:23 GMT oldcoder
    
    And don't forget the MASSIVE security failure.
    
    Having a user application being run by the GPU that bypasses any memory constraints...
    
    0 5
    1. Wednesday 1st May 2013 10:08 GMT Bronek Kozicki
      
      Re: And don't forget the MASSIVE security failure.
      
      and the "bypass" part came from ... ?
      
      3 0
      1. Wednesday 1st May 2013 13:43 GMT Destroy All Monsters
        
        Re: And don't forget the MASSIVE security failure.
        
        Anyone who remembers the Guru Meditation, Press Left Mouse Button To Continue?
        
        1 0
        
        Wednesday 1st May 2013 20:03 GMT Trib
        
        Re: And don't forget the MASSIVE security failure.
        
        I read this article and first thing that came to mind was 'Fat Agnus'!
        
        1 0
        
        Friday 3rd May 2013 22:19 GMT Destroy All Monsters
        
        Re: And don't forget the MASSIVE security failure.
        
        I read this article and first thing that came to mind was 'Fat Agnus'!
        
        This has nothing to do with Gabe Nevell!
        
        0 0
  2. Wednesday 1st May 2013 19:47 GMT Anonymous Coward
    
    Re: What was UMA architecture then?
    
    Thanks @Flocke Kroes for answering my question! Don't know why you were given a down vote for your answer!
    
    0 1
  3. Thursday 2nd May 2013 04:25 GMT @Yehuda Ben Yehudin.
    
    Re: What was UMA architecture then?
    
    Right, so that's why the Cell SPE did understand virtual system memory addresses and used those to access system memory. How can you chase pointers, ensure secuirty, etc etc., in a reasonable, portable and high performance way.
    
    Cell showed that that architecture could be used even for things such as intensive pointer chasing in garbage collection. (Check out their cool Cell GC work in the VEE conference!)
    
    0 0
  4. Thursday 2nd May 2013 08:17 GMT borje
    
    Re: What was UMA architecture then?
    
    To work well and with good performance, the correct page size as well as TLB structure is vital.
    
    X86 systems today, work mostly with 4kB pages (there might be a handful of TLB-entries that can be used for huge (2MB) pages). Dividing a main memory of multiple, maybe +100 GB of memory into 4kB will be a huge overhead. It will be even worse with a combination of CPU's and GPUs.
    
    4kB page size and 1024 TLB entries mean that you can only access 4MB of virtual memory before you need to start replacing TLB entries (reading the translation between virtual to physical memory from memory, before you can access the memory - ie you double the number of memory transactions).
    
    SPARC and POWER today support much larger page sizes (+1GB) and this is something that needs to be done in X86 too.
    
    0 1
    1. Thursday 2nd May 2013 10:48 GMT Anonymous Coward
      
      Re: What was UMA architecture then?
      
      x86 supports 2MB (huge pages), 4 MB(huge pages+pae) and even 1GB pages(large pages)
      
      0 0
Wednesday 1st May 2013 04:19 GMT Eddy Ito

Interesting

So who is interested in the tie in with SeaMicro tech? Oh it's going to be in a cluster alright and even though I don't know what I'd do with it, I still want one.

0 0
Wednesday 1st May 2013 04:41 GMT Rampant Spaniel

So what I really want to know is will this actually result in a decent improvement in performance using photoshop \ lightroom \ premier pro? The mercury engine in PP was a nice improvement, ssd's helped with the other two a little, but this year on year ~10% performance jumps from intel, and amd struggling to keep up is getting old. If you want me to drop a few thousand on a new workstation, cough up a decent performance jump. If they turn round with an 16 core APU with 200% of the performance of a 4770 that doesn't require its own power station or ac unit I shall be suitably humbled and reach for my wallet.

If this comes out with a mild performance bump, on lithography 2 steps behind intel then honestly I will be disappointed. I would love for AMD to knock it out of the park, they are the only thing that keeps intel vaguely awake in the desktop space.

3 0
1. Wednesday 1st May 2013 05:31 GMT Anonymous Coward
  
  It could have an impact if the software is written to take advantage of it.
  
  What concerns me though, I recall some time back (a couple of years ago) there being a WebGL exploit that could extract pieces of video RAM. Admittedly, the exact problem occurred nearly 2 years ago, and a lot has changed since then, however this isn't to say the same vulnerability can't exist in future software.
  
  What makes this kind of vulnerability dangerous here though is that this sort of architecture potentially opens up your entire system memory to attack via the same vector, since video RAM is essentially your main system RAM.
  
  The idea is not new though... the SGI O2 has a similar design, as did a lot of late 90's era desktop boards which had integrated video devices.
  
  1 0
  1. Wednesday 1st May 2013 14:35 GMT h4rm0ny
    
    What concerns me though, I recall some time back (a couple of years ago) there being a WebGL exploit that could extract pieces of video RAM. Admittedly, the exact problem occurred nearly 2 years ago, and a lot has changed since then, however this isn't to say the same vulnerability can't exist in future software.
    
    Issues of this nature were given by Microsoft as the reason they hadn't implemented WebGL in IE for such a long time.
    
    1 0
  2. Wednesday 1st May 2013 16:25 GMT Lennart Sorensen
    
    I believe most of the SGI machines had all of the video card mapped in the CPU memory space so everything could access everything else.
    
    Of course it used to be video cards had their memory mapped into the memory space of the PC, although there wasn't as much acceleration then, so allowing the CPU a fast way to write updates to the video card made sense. Once we got 3D chips with hundreds of MB of ram, the 32bit memory space started getting a bit tight and they stopped doing that for all the memory. No reason a 64bit machine couldn't allow everything to be mapped into one memory space though, unless you want to support running 32bit software still.
    
    0 0
Wednesday 1st May 2013 08:41 GMT Crisp

Things that make you go hUMA...

This is interesting stuff. It'll be more interesting to see it in practice. Especially as more and more developers have started using the GPU for parallel tasks.

0 0
Wednesday 1st May 2013 09:48 GMT Mips

Acronyms

AMD need to take care, they could finish up with an ultra religious faction somewhere in there. Homeland Security will be watching.

0 0
1. Wednesday 1st May 2013 13:42 GMT Destroy All Monsters
  
  Anticitizen#1
  
  DHS is too busy burning through all the beeellions of dosh setting up random Overwatch Checkpoints and training at the range with up to 2 bullets fired for each US citizen yearly.
  
  1 0
  1. Friday 3rd May 2013 16:38 GMT Palf
    
    Re: Anticitizen#1
    
    American logic: learning to shoot things is cheaper than providing its citizenry with health care.
    
    0 1
Wednesday 1st May 2013 12:47 GMT Matt Bryant

Bittiness?

OK, not to get too down on this, what about memory bandwidth? Maybe an extreme example but the nVidia Quadro 6000 in my workstation uses GDR5 memory with a bus width of 384bits, and has an 128bit graphics engine, whereas its Xeon CPU is 64-bit and uses 64bit wide DDR3 SDRAM - two sets of completely different code have to run on each because the architectures are so different. Now, unless AMD are saying they're going to bump up their CPU cores to 128bit designs with much wider and faster built-in memory controllers that means the graphics engine actually has to accept CPU-specified memory that will be painfully slow compared to the discrete memory on the stand-alone graphics card. All in all, it may be great for tablets or handhelds, but not for PCs.

0 0
1. Wednesday 1st May 2013 21:49 GMT Richard 12
  
  Re: Bittiness?
  
  It's for their "APU"s, which are CPU with on-chip (possibly on die?) GPU.
  
  So their GPU is already using the same physical memory bus and memory hardware as the CPU.
  
  This isn't for discrete GPUs.
  
  Looking at the list of partners, seeing ARM is very, very interesting - GPGPU in a Cortex A* is already very cool, and this would not only add go-faster stripes but severely reduce the CPU needed.
  
  Anybody for 2-big.2-little.loads-of-titchy?
  
  0 0
Wednesday 1st May 2013 13:16 GMT Anonymous Coward

If both the CPU and GPU can address a 64bit virtual address range, then why limit the slow store to SSD. Very large data objects could be paged out to the cloud, as cloud storage is good.

0 1
1. Wednesday 1st May 2013 17:38 GMT asdf
  
  >Very large data objects could be paged out to the cloud, as cloud storage is good.
  
  Looks like you picked the wrong icon. You really should have picked the Joke Alert icon for that post.
  
  1 0
  1. Wednesday 1st May 2013 17:44 GMT Rampant Spaniel
    
    No, the cloud would be ideal, as long as you have a decent LTE connection! You could even use azure to piss off Eadon.
    
    1 1
    1. Wednesday 1st May 2013 18:53 GMT Eddy Ito
      
      No, I'm afraid that would actually make Eadon happy, perhaps to the point of climax. I mean just the thought of copy-pasting "FAIL" fifty times might do it for him. Unless he's actually Steve Balmer doing a parody of a barking mad penguinista in which case climax is a given.
      
      1 0
    2. Wednesday 1st May 2013 22:05 GMT asdf
      
      wow
      
      I thought it was just somebody trolling but where to begin with the cloud idea. Umm performance, issues, security issues, availability issues etc. Very large data objects for most employers contain valuable proprietary data that is best kept in house if for no other reason.
      
      0 1
      1. Friday 3rd May 2013 01:28 GMT Rampant Spaniel
        
        Re: wow
        
        Seriously? I had thought the lte might have made it clear to all but the most uphill thinkers. :-)
        
        0 0
Wednesday 1st May 2013 13:47 GMT BornToWin

AMD is definitely forward thinking

This development has been in process since 2006 and it's finally starting to come to fruation. It's great to see AMD leading the next big PC performance improvement. They may have had their troubles over the years, no thanks to Intel's illegal practices, but at least AMD continues to deliver the best value products for consumers.

I personally will be buying a Kaveri powered laptop as soon as they are available for purchase. Kaveri should offer a dramatic improvement in APU performance allowing AMD to then offer mid-range and high-end desktop solutions that equal current and future discrete CPU/GPU systems, but at a lower cost with lower power consumption. It's all good for consumers.

1 1
Wednesday 1st May 2013 14:14 GMT FSM

APU

I wish they'd cut out the 'it must have a cool name' bullshit and just call it a Heterogeneous Processing Unit. Sounds good and is based on fact.

Very encouraging news nonetheless.

0 0
Wednesday 1st May 2013 23:32 GMT 0_Flybert_0

All I'm wondering is ..

what do Intel and nVidia have up their sleeves for CPU/GPU shared memory that will beat AMD out of the gate by a year to 18 months on a smaller process .. say 14nm ..

* HSA Foundation, of which AMD is merely one of many members along with fellow cofounders ARM, Imagination Technologies, Samsung, Texas Instruments, Qualcomm, and MediaTek.*

Notable lack of Intel .. nVidia .. you'd think Google would be interested in the tech as well ..

down votes expected .. <sigh>

1 0
Thursday 2nd May 2013 06:33 GMT Shagbag

Does this mark the death of x86?

I don't know much about the difference between x86 and ARM but will this stuff level the playing field between x86 and ARM around computational efficiency?

0 0
Thursday 2nd May 2013 07:52 GMT CastorAcer

Everyone is focussed on the fact that hUMA allows GPU access to DDR3...

What I'm intrigued to know is whether this would support (using virtual addressing) CPU / GPU access to shared GDDR5 and DDR3 memory pools. That capability would get very interesting...

0 0
Thursday 2nd May 2013 10:48 GMT Sir_bobbyuk

replacement to the X86 architecture all to gether

Isnt any of the big chip companies doing a new architecture to replac the x86 or will that cause a slight problem

0 0
This post has been deleted by its author
Thursday 2nd May 2013 19:34 GMT NIghtFlight

The Commodore Amiga line of computers had unified memory way back in 1985. Such hardware was way ahead of it's time.

I sometimes wonder how Amiga came to fruition, because there was practically nothing else that could touch it:

. 16/32-bit CPU (Motorola 68xxx series and later IBM PowerPC accelerators)

. Preemptive multi-tasking built into the hardware

. Custom chipsets for video, audio and input/output

. WIMP interface.

. Programs running in their own screens, each with their own resolution.

. Video-friendly timing, hence no RCB monitor required. Hugely popular with cable networks.

. Massive array of public-domain software

I owned several Amiga's in addition to x86 hardware and from a hardware viewpoint the Amiga was in a class of it's own, even compared to a Mac or Atari ST. The thing could even run Mac software through emulation, at a fraction of the cost of a real Mac. The biggest problems I faced were the cost of upgrades, along with public perception (due to marketing), of Amiga being a kid's game machine rather than a creative powerhouse.

Sorry of this post was a tad off-topic. I have fond memories of this machine and the things it could do, as well as the great community who stood by it through thick and thin. Seeing innovative products like this from AMD reminds me a little of those days.

1 1
1. Friday 3rd May 2013 16:34 GMT Destroy All Monsters
  
  Less "unified memory" than 1 CPU (with no memory-management unit, hence the need to save often, save early and the kludge of patching up executables when loading them into memory) plus off-CPU video hardware that could access the lower 512K with DMA.
  
  All very nice, but not exactly on the level with 2013.
  
  The problem is that backward-looking enhances things. Do you remember the beautiful workbench? Then you look at it for real and you know ... it was nice then but it sure ain't now.
  
  0 0

This topic is closed for new posts.

Other stories you might like

Kaby Lake-G chip back from the grave, now on modest firewall-router-NAS mobo

Intel CPU that incorporated an AMD GPU into the processor package resurrected by Topton

Systems 27 Mar 2024 | 7

AI cloud startup TensorWave bets AMD can beat Nvidia

Starts racking MI300X systems - because you can actually buy them and they beat the H100 on many specs

Systems 16 Apr 2024 | 4

AMD to open source Micro Engine Scheduler firmware for Radeon GPUs

And it was all thanks to peer pressure

Systems 5 Apr 2024 | 12

Latest AMD Ryzen Pro chips are similar silicon, more smarts

That other processor company really wants you to use AI at work

Systems 16 Apr 2024 | 2

Standardization could open door to third-party chiplets in AMD designs

Video Domain-specific accelerators are 'essential to progress' it claims, and a chiplet ecosystem is one way forward

Systems 27 Mar 2024 | 2

Intel Gaudi's third and final hurrah is an AI accelerator built to best Nvidia's H100

Intel Vision Goodbye dedicated AI hardware and hello to a GPU that fuses Xe graphics DNA with Habana chemistry

Systems 9 Apr 2024 | 1

Imagination licenses RISC-V CPU cores for smart TVs, IoT, embedded stuff

Chip designer legging it after Arm

Edge + IoT 8 Apr 2024 | 14

Intel over the Moon as Lunar Lake’s NPU performance TOPS Meteor Lake

Intel Vision Pat Gelsinger claims 3x performance in next-gen silicon for AI PCs

Systems 10 Apr 2024 | 3

SiFive is back with another 64-bit RISC-V dev board – hopefully

Ditching Intel for a system-on-chip out of Beijing

On-Prem 9 Apr 2024 | 6

Loongson CPU that performs like 2020 Core i3 makes its way to Chinese mini PCs

Slow but bona fide made in China

Personal Tech 13 Apr 2024 | 71

Google joins the custom server CPU crowd with Arm-based Axion chips

Cloud Next Neoverse V2 cores available in GCP later this year

Systems 9 Apr 2024 | 1

Tiny Corp launches Nvidia-powered AI computer because 'it just works'

Startup slams AMD for buggy firmware

Personal Tech 25 Mar 2024 | 9

The Register Biting the hand that feeds IT

About Us

Our Websites

Your Privacy

Situation Publishing

Copyright. All rights reserved © 1998–2024