Nice...
will I be able to get a couple at PCWorld or Argos?
It looks like China is getting ready to scare the wits out of – or maybe some life into – the US and European supercomputing establishments and their sugardaddy governments once again by taking the top slot in the June rankings of the Top500 supercomputers in the world. And this time, it will be with an all-Intel ceepie-phibie …
"Maybe they can borrow the money from the Chinese government, which has something on the order of $2 trillion sitting around?)"
NOPE! Because, guess what - the US govnmt has ALREADY borrowed that money from the Chinese government in exchange of treasury bills, to be financed by future taxpayers.
In the year 2525, if the dollar should survive....
I'm unable to understand the size and complexity of those monsters, but I really like to know how they sysadmin those. How they provision and load balance applications and users. How long does it take from initial load until execution when you request 10000cores. What kind of checkpoint/restart software do they use.
I once wrote an MPI ping which I frequently ran on all large clusters and shmems I was able to get my hands on (I used to work with SGI HPC 10 years ago). And with the increase of cores and nodes the bandwidth and latency did suffer. On large systems it was very easy to just congest IO unless there were strict limitations on my user. Broadcast pings where out of the question, closely build up a 1to any core ping and build carefully selected latency and bandwith maps for each core to any other core in the system. And then try out 1to2, 1to3, 1to4 etc until peak bandwith was reached, then 2to2,2to3 etc and so on then at the end lookup the ones with the lowest overall latency and highest bandwidth combinations to find the system bandwidth.
Morten, I agree with you.
I work in HPC on current SGI systems. I really don't think systems like that ever run on massive numbers of cores except for those hero runs to get the HPL number. But I wouldn't know.
And yes - checkpoint and restart. The more blades you have in the cluster, the more likely one will fial during a computation.
Whilst the holy grail might be to get your 'code' to scale to 1000's (10's of thousands?) of processor cores to enable to the fastest time to solution, sometimes it's just simpler to make multiple job submissions using a lower core count - therefore you need capacity (lots of processing resources available) as well as capability (tightly coupled infrastructure supporting massively parallel blah blah blah). I suspect the Tianhe-2 system will support both types of processing requirement, with a bit of HTC thrown in.