NEC and Stratus promised you new, beefy fault tolerant servers would ship in June. But the boxes ended up shipping in July. We'll forgive the vendors a wee, one-month lapse, especially since they've tossed out some new, low-end additions as well. As of this week, customers will find a two-socket box centered on Intel's quad- …
fault-tolerant communication through shared memory?
If I read the article correctly, you described a fault-tolerant system in which duplicated processing elements communicate via a single shared memory. The shared memory architecture sounds like SMP to me, and is a single point of failure that renders the whole system non-fault tolerant. (Please correct me if I am wrong here.)
Also, I found the following sentence especially interesting:
The fault-tolerant gear usually makes it way to banks, stock exchanges, emergency call centers and the like.
Large banks, stock exchanges and emergency call centers already use real fault-tolerant gear: Tandem systems. Tandem is now a division of HP, and has always specialized in "shared-nothing", real fault tolerance. As for "and the like", AOL uses Tandems and the NSA uses Tandems, along with numerous other organizations that would rather buy a 747 instead of gluing a million pigeons together.
HP NonStop is higher end
I can't remember the details, but I did speak to a few financial types about the Stratus FT servers a while back, and they said the systems did not have a SPOF (Single Point Of Failure, or "single point of f***up" if you're the user) including the memory. Having said that, I haven't played with the Stratus kit myself. The NonStop is a whole different league, though, it's like comparing an articulated lorry to a pickup truck - NonStop really is aimed at replacing IBM mainframes. Most of the Stratus stuff I've been told about was serving less important business processes whereas NonStop was bought to run the business critical (i.e., "it fails we all get pink slips") processes such as stock exchanges. Blades and VMWare's software (especially VMotion) will eat a big hole in the Stratus market as the technology is offering almost the same levels of reliability at a much lower cost, and then HP are making new low-end NonStop servers to squeaze them from above. Nice to see NEC still trying, though.
Memory is not shared
The article is incorrect in terms of shared memory. What you have is 2 CPU sections which contain the North Bridge, memory and cpus and 2 I/O sections which contain the rest. Each CPU section contains 1 or 2 procs (SMP in the case of 2) and upto 6GB, 12GB, 24GB etc of memory. The 2 CPU modules are identical and send data to both I/O modules which then compare the results. If they're different then the I/O modules "shoot" one of the CPU modules, phones home for a replacement and you hot plug it. All of this is totally transparent with no downtime or pause in operation while the new part is being installed. You can even patch the OS while the system is still live for the most part. I believe only Microsoft's really big patches require a full reboot.
The article should actually say that in effect you get 2 procs and 24GB of memory in terms of what the OS sees (MS or Linux) even though you actually buy 4 procs and 48GB of memory. There's a licensing agreement with Microsoft to allow what is in effect 2 servers to run on just 1 license.
Stratus have an equivalent to the Non Stop Server which is the Continuum range which runs most of the big banks and stock exchanges (see the "About Stratus" on press releases). Having said that you don't get a Non Stop Server for the same price as an FTServer so yes we are comparing an articulated lorry and a pickup.
And no I don't work for Stratus (anymore)
- Review Samsung Galaxy Note 8: Proof the pen is mightier?
- Nuke plants to rely on PDP-11 code UNTIL 2050!
- Spin doctors brazenly fiddle with tiny bits in front of the neighbours
- Game Theory Out with a bang: The Last of Us lets PS3 exit with head held high
- Flash flaw potentially makes every webcam or laptop a PEEPHOLE