Servers are going virtual these days, so maybe it is time for server chipsets and interconnects to do the same. With Advanced Micro Devices not building any chipsets that go beyond four Opteron processor sockets in a single system image – and no one else interested in doing chipsets, either – there is an opportunity, it would …
More than 100% efficient?
I tried running a nicely parallel shared memory workload (75% efficiency on 24 cores in a 4 socket opteron box) on a 64 core ScaleMP box with 8 2-socket boards linked by infiniband. Result: horrible. It might look like a shared memory, but access to off-board bits has huge latency.
BTW, I once got 110% efficiency on two cores, because the problem did not fit into the memory band of one socket, so spreading the workload over 2 cores or more reduced the latency nicely. Weird but true.