Intel said it was working on stacking a layer of memory on its Xeon processors to run memory-bound workloads faster. It said this in a pitch at the Denver-based Supercomputing Conference (SC13) which is running from 17 to 22 Nov. According to an EE Times report, Intel's Rajeeb Hazra, a VP and general manager of its data …

COMMENTS

House rules Send corrections

This topic is closed for new posts.

Thursday 21st November 2013 16:10 GMT Mage

Hmm...

Like Samsung was ding with ARM about 8 years ago?

:-D

2 1
Thursday 21st November 2013 16:26 GMT Suricou Raven

So, cache?

Except bigger, and explicitly managed by the software?

The compiler would need to put it to good use, but Intel are quite good with their optimising compiler.

2 1
Thursday 21st November 2013 16:26 GMT Anonymous Coward

Why ?

Why on earth make a difference between internal and external memory ?

All it does is add an extra layer of complexity for programmers...

Just let the processor treat it as one big contigous memory space and be done with it.

12 0
1. Thursday 21st November 2013 17:15 GMT mccp
  
  Re: Why ?
  
  "Why on earth make a difference between internal and external memory ?"
  
  So that you can make your memory-bound workloads run faster by telling your application to use the internal rather than the external memory. It's quite likely that the memory mapping will be taken care of by the compiler so that it will all work like magic for programmers who don't like complexity.
  
  1 1
2. Thursday 21st November 2013 18:05 GMT Anonymous Coward
  
  Re: Why ?
  
  Work harder, little engine.
  
  Someone is dangling gigabytes of monsterous bandwidth low-latency memory in front of you, and your first reaction is "it's tooo haaarrd"??
  
  One day, it will be one big contiguous memory space. Until then, programmers will have to earn their pay packets. Perhaps that will sort out the cans from the can'ts.
  
  (I speak as a programmer, not a hardware guy).
  
  0 1
3. Thursday 21st November 2013 19:05 GMT Nigel 11
  
  Re: Why ?
  
  Chances are it'll be well-hidden down in the O/S's virtual to physical page translation, so it'll look like one big contiguous space if your program isn't the sort to need every speed optimisation it can find. You'll probably have access to an extended malloc with a flag to request near memory, and be able to request that pages be locked in near or in far memory. The paging system will probably have algorithms to move busy pages of far memory into near memory and to move idle pages of near memory outwards.
  
  There's already a sort of near/far distinction on some multi-CPU systems, in that memory is local to a CPU or local to a different CPU. On the Quad CPU AMD system I once looked at in detail, 1/4 of the memory was local, 1/2 was one CPU hop away, and 1/4 was two hops away (so effectively three levels).
  
  1 1
  1. Friday 22nd November 2013 09:50 GMT JLH
    
    Re: Why ?
    
    Nigel 11, you have it right regarding NUMA systems.
    
    And as you say quite run-of-the mill multi-CPU motherboards are already NUMA systems.
    
    And there are much bigger NUMA systems out there!
    
    Install the absolutely great tool 'hwloc' from the OpenMPI project
    
    http://www.open-mpi.org/projects/hwloc/
    
    You can get a graphical display of how your system is laid out.
    
    Assuming you are running Linux, install the 'numactl' package and use
    
    numactl --hardware
    
    numastat
    
    0 0
4. Thursday 21st November 2013 23:17 GMT RichWa
  
  Re: Why ?
  
  Faster speed with lower power requirements.
  
  0 0
Thursday 21st November 2013 17:05 GMT Jim O'Reilly

HMC by another name!

When Intel pulled out of the Hybrid Memory Cube Consortium, the reckoning was that they intended to roll their own version, and this is it! It's probably put together with Micron as a partner, and is close to, but not exactly the same as, HMC.

The high end will be interesting. NVidia is in to HMC, AMD is likely working up the idea, too.

And the idea of memory stacking fits the mobe market as well, so ARM is in the game!.

Roll on persistent carbon nanotube memory. That will upset the applecart again!

HMC talks to Terabyte/second speeds, so it will be a big impactor on performance.

0 1
Thursday 21st November 2013 17:18 GMT Roo

"There would also need to be data moving or tiering software to transfer data from Far Memory into Near Memory and vice versa."

Like the OS... :)

With a *nix you could set up the far memory as a swap device, no application tweaking necessary.

A long time ago I was infatuated with the idea that you could simply ditch L2->N caches and slap in a chunk of fast and wide local memory instead - and let the MMU and the kernel handle the caching of stuff in that local memory. The idea behind it was allowing folks to get more deterministic behaviour from their code by removing async caching logic from the equation, it wasn't my finest idea, the benefits would have been small and the downsides pretty huge I think. :)

Funny to watch Intel scrabble around looking for USPs that other folks have already done. :)

0 1
Thursday 21st November 2013 18:13 GMT Anonymous Coward

The only reason there isn't more RAM on CPUs is that's often the bit that tends to have a manufacturing fault, this rendering a whole CPU worthless (or destined to be a low price version with reduced functionality).

0 1
1. Thursday 21st November 2013 19:01 GMT theblackhand
  
  RAM on CPU's
  
  Are you sure?
  
  The main issue with RAM on CPU's is the area required (which is also why it is statistically more likely to have a fault). Redundancy is easy (create a block of X units where product requires X-1 - disable one unit). Intel have released papers on how they do this in the 90's.
  
  Putting more RAM as a second layer (i.e. stacked) allows you to get very high bandwidth (wide, short bus) without the level of complexity required in achieving the same thing from an off die memory subsystem where trace path lengths can result in timing issues.
  
  0 0
  1. Thursday 21st November 2013 19:08 GMT Anonymous Coward
    
    Re: RAM on CPU's
    
    "The main issue with RAM on CPU's is the area required"
    
    Absolutely.
    
    Look at any IA64 chip picture and see how much area it needs.
    
    IA64 = Big cache chip with dumb-design processor attached.
    
    That worked well for them didn't it.
    
    0 1
This post has been deleted by its author
1. Thursday 21st November 2013 19:13 GMT Nigel 11
  
  Re: Why have the OS manage it?
  
  All systems with virtual memory have the O/S managing the physical pages of memory (and backing storage). The hardware does the virtual address to physical address translation when the data has a current physical address, and throws a "page fault" when the data has to be moved from backing storage to a free physical page. the O/S handles the page faults. With different classes of RAM the O/S will also be managing movement of data between near and far physical pages when necessary, while the virtual address of the data that the programmer uses won't change.
  
  0 0
Thursday 21st November 2013 19:05 GMT Anonymous Coward

Someone's reinvented NUMA?

NUMA = non uniform memory access.

My first exposure to NUMA was in DEC's "Wildfire" product family (aka AlphaServer GS80/GS160/GS320), which was an Alpha-based high (for those days) end server with up to 32 processors, with processors in chunks of four per box. Each box (known as a "quad building block" had a certain amount of local memory, accessed relatively quickly, and could also access (transparently, in the address space) the memory in other boxes. Although the software didn't directly know whether memory accesses were remote or local, there were performance penalties for remote access. Depending on application and OS, the performance penalties might be small or might be significant. Wildfire was not initially marketed as a NUMA

Now Intel have invented NUMA again, everything will be correspondingly faster.However there will still be penalties for accessing non-local memory...

One of the big challenges for Intel, just as it was for DEC's Wildfire, will be maintaining cache coherence across the processor/chip/box boundaries. Alpha systems should have had it relatively easy with that because the architectural model said that it was a Bad Idea to assume full time memory coherence across multiple processors. Wildfire's performance still suffered because of system design issues - there was a noticeable latency in remote access (factor of 3?). The next generation Alpha boxes, based on "EV7" Alpha chips with lots of interconnect stuff built in on the CPU chip rather than in the external support chipset, significantly reduced the remote access penalty, to the extent that NUMAness was barely relevant, except perhaps at the level of the OS scheduler (don't move stuff from one CPU to another unless you have too - bit of context in the Suse/UKUUG snippet below).

On the other hand, x86 systems and applications for the last few decades are typically designed around the legacy x86 concept of all memory being coherent all of the time (or has AMD64 finally fixed that, given that Opteron introduced ccNUMA on x86-64?). If that hasn't been properly fixed, it's going to be a right pain making this work effectively on anything other than a Powerpoint slide for the journalists.

http://www.compaq.com/alphaserver/archive/gs320/

http://www.ukuug.org/events/linux2001/papers/html/AArcangeli-numa.html

etc

1 1
1. Friday 22nd November 2013 09:57 GMT JLH
  
  Re: Someone's reinvented NUMA?
  
  AC - thanks for that link to the UKUUG meeting!
  
  wow - that's a bit of history. Look at the speaker list: http://www.ukuug.org/events/linux2001/speakers.shtml
  
  0 0
Thursday 21st November 2013 19:09 GMT Christian Berger

Would be cool in the embedded world

Currently one of the big problems in the embedded world is that you have extremely little memory. Adding external memory typically means having to go to BGA and 8 layer PCBs which is rather expensive.

Now if you had significant (>16 Megabytes) memory inside your CPU this might be an competitive edge for Intel over ARM in the embedded market.

1 2

This topic is closed for new posts.

Topics

Special Features

Vendor Voice

Resources

COMMENTS

Hmm...

So, cache?

Why ?

Re: Why ?

Re: Why ?

Re: Why ?

Re: Why ?

Re: Why ?

HMC by another name!

RAM on CPU's

Re: RAM on CPU's

Re: Why have the OS manage it?

Someone's reinvented NUMA?

Re: Someone's reinvented NUMA?

Would be cool in the embedded world

Other stories you might like

Intel's effort to build a foundry biz is costing far more – and taking longer – than expected

Intel Gaudi's third and final hurrah is an AI accelerator built to best Nvidia's H100

Intel's foundry business bled $7B in 2023 with more to come

Intel's neuromorphic 'owl brain' swoops into Sandia labs

Meteor Lake CPUs splash down in socketed motherboards for edge and embedded workloads

Intel over the Moon as Lunar Lake’s NPU performance TOPS Meteor Lake

It's 2024 and Intel silicon is still haunted by data-spilling Spectre

Intel fuels Huawei's AI PC ambitions with Meteor Lake CPUs in MateBook X Pro

US lawmakers rage over Intel Meteor Lake-powered Huawei PC

Intel preps export-friendly lower-power Gaudi 3 AI chips for China

Intel CEO suggests AI can help to create a one-person Unicorn

Next Vision, or Vision Next? What we really thought about Google and Intel's AI events

About Us

Our Websites

Your Privacy