Hewlett-Packard has blades on the brain for both "industry standard" and "mission critical" servers, but IT managers in the United Kingdom seem to be more worried about the cost of their mission critical platforms, generally Unix boxes, according to a report released by Coleman Parkes Research. HP and its public relations firm, …
"unless customers really want x64 blades with Itanium-class reliability features."
And what meaningful RAS features would IA64 in Integrity be able to provide that AMD64 in Proliant hasn't already been able to for years?
Yes I completely believe that an IA64 system can deliver better RAS at the application level than an AMD64 system, but as far as I can tell the missing RAS features are in the OS (be it NonStop, VMS, or maybe even HP-UX), they do not appear to be (as far as I've been able to determine) in the chip or even in the board.
If there is any truth in the IA64 RAS-superiority claim, it'd be lovely to actually see it substantiated with hard evidence at least once before IA64 goes end of life, rather than the usual deal with PR spinners just regurgitating unsubstantiated politically-motivated (not technology-based) HP guff.
RAS on chipsets
AC, I would broadly agree howver there are examples of RAS being implemented on the chipset, Himalaya for example. One more mainstream would be AIR (Automatic Instruction Retry) on the SPARC 64 chipset, implemented to deal with the significant amount of 'transient' errors.
> Solaris is still the dominant operating system for mission
> critical workloads, with 29 per cent of those polled saying
> Solaris underpins their primary mission critical workloads,
> followed by 25 per cent using HP-UX, 21 per cent using
> Windows, and 18 per cent using AIX. Another 8 per cent use
> other platforms for mission critical jobs, such as OpenVMS
> boxes, Power-OS/400 machines, or mainframes.
Presumably the Linux systems are filtered out of these numbers. They must surely justify more than a share of the 8% "other" category.
Lies, dammed lies, and statistics.
RE: Linux missing.
If the survey covered more general systems rather than mission critical ones then I'm sure we'd see the much larger share of Linux showing. But Linux seems to have missed the boat in the MC area, us customers clinging to our commercial UNIX and expensive RISC/EPIC CPUs, even though Linux will run on those CPUs. IME, switching "edge" servers, such as webfarms and fileservers, to Linux is pretty easy, but getting the budget-holders to place their faith in Linux for those "it-runs-or-we-die" applications is a lot harder, even when you can demonstrate a good saving with Linux. But we used to have people telling us we could never have Wintel in our main datacenters because it was "too insecure and unresilient", and that rule has long since disappeared, so maybe Linux only has a while longer to wait.
Linux ain't mission critical
The reason Linux doesn't feature in a poll of mission critical servers is because in the real world it isn't up to the job. It's all well and good talking about the perceived savings - which is a myth anyway, but if you want servers that stay up, you need something designed for the job. Solaris on SPARC, AIX on PPC etc... Have you catually tried running big apps on LINUX on blades?? My advice? forget it. It does not scale.
RE: Linux ain't mission critical
So, what's it like to be stuck in the Eighties? Since you obviously haven't looked at Linux in the last thirty years, let me assure you it is certainly up to the mission critical roles. And Dell, hp and IBM have been offering MC support for Linux for years, and that is using their resources rather than just doing pass-through support to companies like Red Hat (not that RH's own support is in any way lacking). In hp's case, they even went to the extent of creating a Linux version of their ServiceGuard hp-ux clustering tech.
".....Have you catually (sic) tried running big apps on LINUX on blades??...." No, not really big apps on blades, but I have run production Oracle RAC instances on RedHat on hp Superdome servers, which isn't exactly a peanuts role. Oh, and those Linux instances were replacing the SPARC-Slowaris instances which you seem to think did such a better job.
Just a thought
I dread the day the x86 boxes will go fully pointy-clicky, even in the BIOS-in-all-but-a-name. That's sure to be really useful in the server market. What are they thinking? Very little, but then that's what the boxes are for: eye candy and no thought, for that might get useful work done. Can't have that. So they're merely going full-service. Critically so.
I'm going to miss firmware that isn't stupid.
I'll especially miss Openboot PROM (and, to a lesser extent, its Macintosh and RS/6000 cousins that lack 'sifting' and 'help'). Whenever I sit down at a machine that:
requires a working graphics device for me to talk to the firmware;
cannot provide a diagnostic console on a serial port unless, perhaps, an OS is booted;
cannot offer an interactive command prompt where I can probe devices, run diagnostic routines, or write arbitrary Forth programs;
I feel like I've stepped into the third world of computing. Where I expected to find a toilet, I found a hole in the ground. Unfortunately, the third world won. I guess people think the hole is more user-friendly and intuitive; running water and toilet paper just confuse users, so why provide them?
This EFI crap is no improvement, though at least it can offer a serial console. What the IEEE 1275 stuff accomplished through simple flexibility, EFI accomplishes through bloat and sprawl. One little DIP package on your board that costs less than $20--that's all you need! Hell, I'd take Alpha SRM, SGI ARCS, VAX VMB, whatever that serial console for the HPPA machines was called, or even the old sunmon over EFI and this BIOS crap. Various serial consoles on various machines have saved me a heap of trouble on numerous occasions. On the other hand, I have repeatedly been screwed by the lack of such functionality in most x86 hardware; many a time have I thought, "Now, if this stupid thing had a serial firmware console, I might just be able to find out what the hell is wrong with it. Too bad it has BIOS, so I don't get to do that."
RE: I'm going to miss firmware that isn't stupid.
I hated EFI at first, gradually got used to it, but still don't want to go steady with it. But, seeing as I use it on the Integrity kit, that means I've usually got a proper management processor separate from the actual core of the server, and I can access that either serially or over the LAN. If I want to impress management or the Crayola brigade, I can even fire up a web GUI and get onto a serial console remotely. I know, not a very hairy-chested-BOFH thing to do, but it makes them think they are spending their money wisely! Big bonus is, seeing as the web GUI is common to the ProLiants as well, I can take a sysadmin trained in the pointy-click style of Windoze and get him up to speed on Integrity in no time, which would just be beyond him if I insisted he learned all the old console tricks (just try explaining the difference between VT52, VT100 and hp 700 console to a typical Windoze sysadmin!). And, to be honest, after a while I've stopped missing a "real serial console".
Would be glad to see that dinosaur called Solaris extinguished to the history cupboard. I have no idea why but solaris has always been my slowest and most useless environment. I don't know if it's the rubbish hardware specs that cost a million or the operating system itself. Being expensive doesn't mean it's good! But some people only used solaris and don't know otherwise. No comment about HP-UX never used it. I like windows servers, AIX reminds me of RPG but again no comment. Pls people get rid of solaris, at least make it linux.
I can only speak as a developer
But I'd rather develop on Solaris than HP-UX. HP-UX is like all the pitfalls of Unix with none of the benefits.
You hate Solaris...
pan2008 wrote: "Would be glad to see that dinosaur called Solaris extinguished to the history cupboard"
You'll get your way pretty soon, based on how Oracle are royally screwing up - it's actually kind of funny to see how utterly clueless they are. Shame, because there's a lot in Solaris - like ZFS - which is pretty good from a technical point of view.
then pan2008 wrote: "I have no idea why but solaris has always been my slowest and most useless environment. I don't know if it's the rubbish hardware specs that cost a million or the operating system itself. "
It's the hardware - I'm running a couple of Solaris on Intel installations (actually via VMware) and while it takes a while to startup and shutdown, when it's running it's quite speedy.
finally, pan2008 wrote: "No comment about HP-UX never used it. I like windows servers, AIX reminds me of RPG but again no comment."
Windows servers - okay I suppose, but there's better options (like ANY version of Linux).
AIX is actually pretty good for sysadmin stuff - once you get into their vibe. The IBM hardware is ruinously expensive, but their big iron is very sweeeet.
HP/UX - so-so OS (imho), great for security, but the hardware is - to be polite - not the best. The sooner HP wise up and port from Itantic to Xeon, the better as far as I'm concerned. And yes, I realise that there's folks out there who really like HP/UX - fair enough says I, live and let live (but you're still wrong! <grin>)
Reliability over speed
Solaris may be slower, but at least it works, and you get meaningful support from Sun (or at least you do at present). Move to Linux / blades and move to a world of random meaningless error messages and no possibility of proper fault diagnosis. The only way you get hardware fixed on blades is to keep swapping bits until the problem stops re-occurring. That's hardly viable for a mission critical solution?
"Mission Critical" vs. "Industry Standard"?
The difference isn't as trivial as your parenthetical comment implies. It's true that many companies use "Industry Standard" equipment for "Mission Critical" applications, but that's the bean counters betting the stockholders' money that the company will survive a major "oops". I notice that the companies supplying "Mission Critical" equipment don't habitually put a buggy, insecure OS on that equipment, while the companies selling "Industry Standard" equipment do, even though, by the statistics in your article, that second rate OS (so widely used in consumer and peripheral workstations) is used on a small minority of the boxes responsible for what are considered bet-your-business applications.
Blades and Virtualization?
Surely it doesn't make sense to use blades, if you are seriously looking at virtualization (to get maximum usage) as you are slicing up a number of smaller containers (the blades) instead of a much larger machine (SuperDome or pSeries).
Can someone please explain how the two can live together, as I'm seeing more companies going down this route and it seems like a big mistake?
Himalaya was a long time ago
"there are examples of RAS being implemented on the chipset, Himalaya for example."
Indeed, but that was then, and IA64 is now (it certainly is not the future).
For example, contrary to some PR guff, there is no instruction-level lockstep on IA64, and NonStop software manages quite happily without, and has done for decades: check out this rather dated Tandem presentation (from 1990): http://www.hpl.hp.com/techreports/tandem/TR-90.5.pdf for how they manage without.
Wrt instruction retry and transients: transient memory errors (be they main memory or cache) are often (not always, but often) dealt with by the memory subsystem before the processor core sees them. Or before the DMA controller propagates them - there's no point having a transient memory error which is only correctable by retrying the instruction, if the memory reference is actually coming from a DMA access with no instruction as such to retry (apologies if I misunderstood, SPARC isn't an architecture I'm familiar with).
So, thanks for the suggestion, it's at the kind of level I'm hoping for, preferably based on real current technologies.
£25 of my very own money to the RNLI if I get a real example backed up with pointers to supporting material.
When I wrote "presentation", I meant "paper"
the 1990 thing is rather detailed and rather dated.
Here's a much more recent (2008) and rather more readable (8 pages) writeup on the principles of how NonStop works on today's HP blades:
From page 8: "Because these microprocessors were not deterministic, memory lockstepping could no longer be used. Therefore, the microprocessors were lock-stepped at the I/O level (any packet delivered to the interconnecting ServerNet fabric)."
The lockstep is only at the IO level, not instruction, not memory. It could not be any other way.
As I said above, £25 to the RNLI if anyone can spot something that an x86-64 blade inherently couldn't do (given appropriate firmware and software).
re Blades and Virtualization?
"it doesn't make sense to use blades, if you are seriously looking at virtualization (to get maximum usage) as you are slicing up a number of smaller containers (the blades) instead of a much larger machine (SuperDome or pSeries).
Can someone please explain how the two can live together, as I'm seeing more companies going down this route and it seems like a big mistake?"
Quite possibly so. But the biggest Proliant available today already has 48 CPUs and 512GB of RAM; why would Windows-world people want SuperDome or pSeries instead of that, apart from trendiness of blades vs boxes? Obviously different considerations apply for folks with an investment in a real OS.
re Steve Button - Blades and Virtualisation?
In answer to your questions lots of people do this because of the flexibility it gives them in terms of the deployment and the granularity they can get, service providers in particular go down this route (and if you look at things like HP's Bladesystem Matrix its exactly what this is)
Also using smaller machines i.e. blades means you can still scale (up and out) but you don't have to invest in a big box like a Superdome or P595 (or M9000 to keep the SUN folks happy), you have a bit more of a pay as you grow type approach but using a common build and platform which helps reduce indirect costs as you are introducing a standardised infrastructure with a common build.
I've seen lots of organisations in public and private sectors deploy this kind of platform for shared service delivery as it means they can very easily move resource around at a good price point.
You could also look at it from the point of view that scale out virtualisation helps reduce your risk a bit as you don' t all your eggs in the one basket (albeit a pretty fault tolerant and resilient one) but maybe thats stretching it a bit! Of course the downside of this is the proliferation of device management points but most of the blades come with pretty good management tools to alleviate this issue (well HP and Dell do, IBM's are a bit iffy so one of my Software consultants tells me who has to integrate them into Enterprise Management apps like CA, BMC and Openview).
Its not perfect but it works well enough for most, and it also comes down to what kind of apps you are virtualising and how big your estate is.
Where its not so great is for the large I/O intensive databases and ERP systems, but over time I'd expect them to also succumb as the server market gets continually commoditised and the virtualisation tech improves.
Motivation for a real RAS comparison between IA64 and AMD64
Well obviously £25 to the RNLI (which I'll probably donate anyway) isn't enough to motivate anyone to publish a real examination of the RAS differences between IA64 and AMD64 (or Intel clone) differences. Bear in mind now that the two now share a common expansion bus, CSI/Quickpath or whatever it's now called, so the only relevant features must presumably be either be on-chip in the processor socket, or in the memory subsystem.
I wonder what level of motivation it would take to get such a comparison out in public. When IA64 goes EOL and HP will be saying "Proliant's fine for RAS, use that", it'll have to come out anyway. 'Course in reality their former IA64 customers will likely be feeling "anyone but HP" but there isn't really an acceptable broad-range competitor to Proliant; buying Dell (or IBM) just to spite HP isn't really an ideal answer.