Let's start with a very simple and basic block diagram:
CPU -----> chipset -----> storage subsystem (i.e.3D XPoint).
Try to visualize the CPU as a radio frequency transmitter:
4 cores x 64-bits per register @ 4 GHz is a lot of binary data
On the right is 3D XPoint.
As their measurements show,
Micron achieved "900" w/ PCIe 3.0 x4 lanes; and,
Micron achieved "1800" w/ PCIe 3.0 x8 lanes.
Read: almost perfect scaling.
And, the flat lines speak volumes:
in both cases, the storage subsystem
saturated the PCIe 3.0 bus.
Now, extrapolate to PCIe 3.0 x16 lanes:
wanna bet "3600"? My money says, "YES!"
Now, extrapolate to PCIe 4.0 x16 lanes:
my money says ~ "7200" -- flat line
(maybe not perfect scaling,
but you get the idea :)
Conclusion: 3D XPoint is FAAAST, and
Micron's measurements show that
the chipset is now the bottleneck --
all cynicism aside.