I seem to recall that the reason PCI Express was invented was that it is very hard to keep all the lines in a parallel connection in sync at high speed. So how are they getting around this high speed sync issue on the memory interfaces? And how on earth will they expand this to 1024 bits?
Getting data into and out of memory is like getting books into and out of bookshelves. The more books you can hold at once, the faster you can do the job. Someone able to hold only one book at a time is going to be far slower than another person capable of holding a stack of eight at a time or, with high-bandwidth memory (HBM) …
The issues is that it's very hard to keep all the lines in a parallel connection in sync at high speed *over long distances and through complicated connections*. HBM solves this by making the distances really short, and permanently attaching both the RAM and CPU to the Interposer.
The Interposer is a big silicon chip that doesn't actually have any logic on it, it's just used as a really tiny circuit board to connect the RAM and CPU together. Big silicon chips are normally expensive, but the Interposer is built using old processes in existing fabs, which is cheap because the machinery has already been paid for. The RAM and CPU are attached to the Interposer, the Interposer then connects to the circuit board in the normal ways (usually soldered on, but it could be a socketed chip).
That should take care of some of the Spectre/Meltdown performance problems.
No mention of the ATI Fury?
The first time I encountered HBM architecture was in my ATI Fury graphics card. Its surprising to see many companies mentiond without this mass-market example being mentioned.
Hi Chris, good article. I wanted to clarify your last remarks "If a supplier manages to develop a stacked DRAM DIMM then memory capacity generally could shoot up. However the expense of this could be so great that customers prefer to bulk out DRAM with 3D XPoint or some other SCMs that is more affordable than pure DRAM and still boosts host server performance substantially."
3D Xpoint won't be cheap any time soon. And most other emerging memories can only help with incremental improvement, and granted that can still be a welcomed reality. It would seem that a stacked DDR5 DIMM with 3DS components would provide less bandwidth, and more power/latency than HBM. So cost aside, its not clear that this would be ideal either.
Is the interposer higher manufacturing cost than stacking the DIMM?
Maybe 3DS RDIMM/LRDIMM would be better candidates over the above?
Maybe the manufacturing support and AI/DL/HPC demand will drive HBM cost down, and with its reduced footprint allowing for higher IC integration, maybe (the higher cost to target smaller node) DDR is more likely to be replaced by HBM?
HMC is dead
When DRAM prices started up about a year or two ago Micron finally killed HMC completely. HMC had always been an insurance policy, intended to support exascale computing, in the hopes that if the Korean manufacturers tried to bankrupt them in a price war that Micron could play the National Security card and get a government bailout. The whole thing fell apart around 2010 and in order to save face the developers managed to bamboozle the company into pursuing commercial customers. It was one gigantic money-losing cluster that Micron sunk many tens of millions of dollars developing.