Server maker Silicon Graphics doesn't think it can take on the entire Windows server market, but the company – which is best known for supercomputers and hyperscale rack servers – does think it can chase and win deals in the evolving HPC market for windows. That is why it has certified Microsoft's Windows Server 2008 R2 on its …
256 Processors is cool, but perfomance, as the article says, doesn't scale linearly. Given that the processors will be sharing main memory they will get in each other's way (softened by local memory if they have any, and cache memory) to an extent. So, with winserv 2008 how quickly does the graph of a suitable performance measure tail off as the number of processors increases?
NUMA aware OS
This will be using a NUMA aware version of the Windows Kernel. That is it has a strategy for keeping threads running on cores local to the physical RAM on which their data is allocated to mitigate the hit of data transactions over the switch. For processors with on board RAM controllers, e.g. AMD anything with HyperTransport and Intel Xeon, threads tend to stick to cores within the package with the controller for the bank which has the thread data.
AFAIK Microsoft have been working on this for some years and have a bit of a lead over Linux. There are a few papers floating about which describe how it works.
I'm typing on now on a twin core, 32bit, with 4 gigs of RAM, and yet Windows 7 manages to produce more race conditions than the Cheltenham Gold Cup. This is software that has assumptions like '256Megs is a big disc drive' hard-wired deep into its guts. Half the time I think it must be using all its resources sending error messages to the null device because it was the only way they could find to make it shut up.
Putting Windows on 256 processors is like sitting an Agoraphobic on the top of Mount Everest: it's not only futile, it's cruel. I make no doubt that, some day, carrying that much computing around in your *pocket* may become a reality (look where we've come, in only fifty years) but let's hope what we run on it by then won't be Windows - because even Raymond Chen couldn't justify supporting that much 'legacy'.
"and have a bit of a lead over Linux"
Nonsense. SUSE have put a lot of effort into this and it's all been stable for a long time. Read the article and SGI's web-site. Windows is the also-ran
My OS is better than your OS
If you're going to have a "Linux is better than Windows", "no, actually, it's Windows that is better than Linux" knob waving competittion, please post some sources or elabourate just a little, say with a single technical comment.
I'm interested that Windows can run on such a big machine, I'm also interested that Linux can too. I'd like to know the various advantages and distadvantages of each OS, this is supposed to be a technical site.
That's why ..
I suggested reading more about it on SGI's site. This system has been developed on Linux - there'd be no point developing a complex system without an OS
Read about Pittsburgh SC's Blacklight
Blacklight, the World’s Largest Coherent Shared-Memory Computing System, is Up and Running at the Pittsburgh Supercomputing Center
Featuring 512 eight-core Intel Xeon 7500 (Nehalem) processors (4,096 cores) with 32 terabytes of memory, Blacklight is partitioned into two connected 16-terabyte coherent shared-memory systems — creating the two largest coherent shared-memory systems in the world.
Running 2 Linux images on 2048 cores each
Operating systems start to level out in performance growth due to shared resources. For Linux, this starts to happen around 4 cores, for Windows at about 8. Microsoft has been doing multiprocessor server software long before Linux even was modified to do multiprocessor. NT learned its tricks on expensive server hardware built by UNISYS, back in the early 1990s, well before modern multicore processors were common.
Sophisticated appliations like SQL Server scale up to 128 cores, because they are designed to use the OS in special ways and because there is a special scalable layer called SequalOS that handles a lot of systems activity without even calling on NT.
Dave Cutler's new "Red Dog" operating system has been rumored to scale up to 128 cores do to extensive kernel redesign.
Running 2 Linux images on 2048 cores each !
Never has a single word been so over used..
... as the word 'iron' was in that article. There is reinforcement and then there is just repetitive. I get it. HPC's are 'bigger' than tin. Now please stop!
Nobody could need more than 2,000,000,000Kb
Task-manager is going to struggle showing 256 CPU graphs.
Just post us the screenshot of 256 CPU threads in Task Manager!
Speaking of odious comparisons...
Belluzzo <-> Elop
Running Windows on this machine..
.. is like buying a Porsche and only use first gear ..
The bullet proof one, please..
ALTIX server is just a cluster
That SGI Linux server, is just a bunch of nodes on a fast switch. A cluster, that is.
IBMs largest server P795 has 32 cpus. IBMs largest mainframe has 24 cpus. Solaris biggest server has 64 cpus. Everything greater than that, is a cluster. Especially with such a immature OS as Linux. There is no way in hell Linux handles a server with 32 cpus, or even 64 cpus - you have to go to Big Unix Servers or IBM Mainframes to handle such a high number of cpus, as 32 or 64 cpus.
"just a bunch of nodes on a fast switch"
ONE system image !!
If you run truly parallel, shared-memory code (as we do on gigapixel-range satellite image analysis), an SMP box which is just a set of 4-core or 8-core blades held together by fast switches to appear as a single shared memory box, is pathetically slow, and scales horribly, compared to a 12 core or 24 core box sitting beside it. On the latter I get a speed up of 17.5 on 24 cores, which is kind of neat. On the 64 core SMP I get a speed-up of 1 to 3 on 8 cores, and just dismal results (0.2x "speed up" on 32 cores).
Pittsburgh SC were silly to spend a great deal of money on Blacklight ?
There are plenty of workloads for which the Altix UV is fine
What does a UV run that you can't run on a cluster?
Its not what will run efficiently on a UV, it's what will run efficiently that will not run on an IB connected cluster? Otherwise the UV is a bloody expensive cluster.
Its not a cluster
No its a cache coherent NUMA machine. So any cpu core can access directly any byte of memory, as though it was local to the cpu.
It is not a cluster which is a distributed memory machine.
Michael HF Wilkinson
Oh yes, there are man work loads for which this does well. All of them are clustered work loads.
Can I ask you a question? Why is that all mature, enterprise computer vendors with Big Iron, such as Enterprise Unix AIX, Solaris, HP-UX, Mainframes, etc - why do they only have 32cpus or 64 cpus? Why are there no larger servers? Can you explain that? No? You can not explain that? Is it because them OSes scale bad?
On the other hand, there are lots of systems with 1024 cpus or more, for instance Blue Gene - but they are all CLUSTERS. That supercomputer called "Blue Gene" is basically a large cluster.
There are no Big Iron or heavy Unix servers with more than 64 cpus for a reason - those are not clusters. As soon as you go above 64 cpus, most probably it is a cluster. Otherwise IBM would have released a Unix server with 1024 cpus long time ago. But guess what? IBM had recently to rewrite AIX for it to be able to handle 256 threads. The mature Enterprise AIX did not scale. If those mature Enterprise Unix can not handle more than 32/64 cpus, do you really think that a toy OS such as Linux scales above 32 cpus? Or 64 cpus?
Ted Tso, Linux hacker and creator of ext4, wrote last year that Linux hackers consider 32 cores as exotic hardware, and as such, most probably Linux does not scale to 32 cores.
"...Ext4 was always designed for the “common case Linux workloads/hardware”, and for a long time, 48 cores/CPU’s and large RAID arrays were in the category of “exotic, expensive hardware”, and indeed, for much of the ext2/3 development time, most of the ext2/3 developers didn’t even have access to such hardware. One of the main reasons why I am working on scalability to 32-64 nodes is because such 32 cores/socket will become available Real Soon Now..."
Does anyone agree with you about this ?
Since you peddle this twaddle allover the internet without any support
"On the other hand, there are lots of systems with 1024 cpus or more, for instance Blue Gene - but they are all CLUSTERS. That supercomputer called "Blue Gene" is basically a large cluster."
Jup Blue Gene is a cluster, much like Oracle's exaData products.
"There are no Big Iron or heavy Unix servers with more than 64 cpus for a reason - those are not clusters. As soon as you go above 64 cpus, most probably it is a cluster. Otherwise IBM would have released a Unix server with 1024 cpus long time ago. But guess what? IBM had recently to rewrite AIX for it to be able to handle 256 threads. The mature Enterprise AIX did not scale. If those mature Enterprise Unix can not handle more than 32/64 cpus,"
Actually coherence is made on a core level. So talking about CPU's (chips) doesn't make so much sense. Sure some of the more advanced coherence protocols, like that for example used in the 1024 thread 256 Core 64 Chip/CPU POWER 795, is more complex and have local and global "awareness".
Note btw that this is x2 the threads used in the SPARC M9000, so saying that AIX doesn't scale is.. well.. not true.
" do you really think that a toy OS such as Linux scales above 32 cpus? Or 64 cpus?"
Well I ran LINUX on a 64 threaded POWER machine, what 4-5 years ago, it ran like a dream. And the scalability tests that we ran performed without any problems. I have no doubts that Linux scales fairly well.
Blacklight, the World’s Largest Coherent Shared-Memory Computing System, is Up and Running at the Pittsburgh Supercomputing Center
Single point of failure => what about fault tolerance?
Okay, suppose the software can cope with NUMA on this scale... the next question is, how does RHEL or SLES or Windows Server cope with individual component failures? Is RAM and CPU hot-swap there already? What if a whole multi-socket blade goes down?
(Small hardware = small problems. It's so obviously soothing. Unless you sell too many of them, and there's a systematic design flaw... Have to pet the smartphone in my pocket just to feel basically sane again...)
There is fault tollerance on some of the larger Proliant servers for memory (It's RAIDed) I know that Windows datacentre (and I think other versions) can handle having processors dynamically installed and removed. I don't know if it can deal with processors failing.
I'd say it's fair enough to assume that this system can do that, but it's only an assumption based on techical feasabillity.
UV hardware fault tolerance
Re. UV coping with RAM failures, it has the rather neat trick of detecting DIMM errors then marking those pages as unusable.
What about flash?
All well and good, but does it run Flash video fullscreen? (http://xkcd.com/619/)
One system image, or lots of flexible VMs on one big server?
OK, I suppose there are probably a dozen institutes which would like really large CPU instances of Windows with special NUMA kernels. But there are probably many more commercial businesses that would like a very large x64 box that they can chop into flexible partitions, maybe of 64-96 CPUs each, and then run either VMs or stack SQL instances on them with bog standard Windows.
We are talking HPC use, not general commercial use
HPC works on a few BIG problems per box. Business software usually has a myriad of computationally small problems. Horses for courses.
HPC isn't just one big problem
Most HPC is in finance where many small single threaded Monte Carlo simulations are running. This would be followed by semiconductor guys who also have mostly serial codes, Oil&Gas, the same, followed by Gov labs and Universities who, yes run the Grand Challenge benchmarks like Linpack, (which is an utterly pointless exercise), but most University work is probably serial or maybe 4 way parallel.
RE: We are talking HPC use, not general commercial use
All fine and true, but I suspect there is a lot more money in meeting the commercial requirements than the HPC institute ones. Server vendors are not charities. I just hope some of what goes into the HPC pile (and is probably more than partly funded out of government grants), gets ploughed back into developments that everyday businesses can use.
Will there be enough RAM to play minesweeper on the server?
Paris, because she always beats me at Minesweeper and makes the aircon fail...
I'm surprised at the claim that most University work is serial or 4 way parallel.
Note for instance:
"This massively parallel processing (MPP) system will be targeted for running large parallel jobs using more than 64 processors. "
As someone who has installed HPC clusters on many sites in the UK, I'd be very surprised if they were constantly running 4-way parallel jobs, and even more surprised at the money being spent on Infiniband interconnects which won't be then used to run jobs like this (four and more core processors being commn these days).
The title is required, and must contain letters and/or digits.
Yeah, too bad Windows as an app server is good for two apps both made by M$!
Why even bother?
Man, finally that's some decent hardware
to run a good Windows screensaver.
Just a thought
and an interesting technical blog:
There's been a lot of articles written about their Windows based server farm over the several years the game has been around. It was something like three years ago when they broke the 32-bit item ID by having more than 4 billion items in a table. Last I heard I think they said several million DB transactions an hour was common.
All that has been running on Windows + MS SQL for many years now. Just sayin' :)