They have a lot of room to tune.
NVLink is a beast, albeit a challenge to use effectively if you don't have massive chunks of data.
GPU database-botherer SQream has said its DB runs up to 150 per cent faster when it uses IBM's POWER9 CPUs linked to the GPUs rather than x86 processors. SQream's big idea is to run repetitive data warehouse routines on GPUs rather than on limited-in-number CPU cores and so bring GPU acceleration benefits from AI and machine …
You'd be surprised. The workload nature of databases is that you need a _LOT_ of fast memory. GPU has loads of compute on it, but VRAM is far more constrained compared to system RAM when you can buy a server with 1TB of RAM.
So provided your data chunks reasonably cleanly into VRAM sized portions, you can gain some advantage there, but then there's the overhead of loading that data from system RAM into VRAM, where you get a significant hit.
The reason databases aren't commonly GPU accelerated is because it would only be faster for a small fraction of atypical workloads, and massively slower for most typical workloads.
The argument goes that of you can break your work into smaller chunks then attack the problem in parallel you might not need as much fast memory especially if it is MUCH faster than generic DDR4. If NVlink interconnects at up to 200GB/sec can (in theory) flush through all the RAM on even the largest multi-GPU setup in less than a few seconds then compare that with legacy multi-socket boards that struggle to share anywhere near that amount of data between NUMA nodes in more than a few seconds even using older SLI configurations for accelerating the GPUs using faster GDDR5. If the GPU is doing all the work and CPUs are really no longer constrained by the overall number of PCIe lanes available then this may differ widely between different CPU architectures, it certainly looks like it might between Intel Xeon or AMD EPYC x86 boards when compared to how POWER are doing it, more like the custom accelerator coprocessors on mainframes. I wonder where the next I/O bottleneck will be attacked if multi TB of fast graphics RAM for in-memory databases becomes commonplace.
The entire article is skips on the fact that POWER9 is a full 64-bit processor whereas x86 chips are currently 32-bit processors with 64-bit memory interfaces and have an archaic instruction set. I could make many analogies but it's like saying your modern day BMW M3 is faster than one from 2003. Of course it and it should be. What's more surprising is that it's only 150% faster.
Biting the hand that feeds IT © 1998–2019