Even my crappy 5 year old pc at home shuts down when it gets too hot preventing it from getting damaged. How the hell did they get the server room so hot it fried the equipment?
An investigation has been launched at Leeds' famous St James' hospital after a server room disastrously overheated, permanently frying a new computer system for storing patient x-rays. St James, known as Jimmy's locally, is run by Leeds Teaching Hospitals NHS Trust. It confirmed that early assessments indicate damage has …
Even my crappy 5 year old pc at home shuts down when it gets too hot preventing it from getting damaged. How the hell did they get the server room so hot it fried the equipment?
What kind of hardware doesn't bring on board temp sensors and self protection systems that shutdown the system (gracefully or otherwise) to prevent permanent damage?
Unless of course these geniuses disabled them because they were shutting down all the time.
I believe each rack was setup with liquid cooling. Unfortunately, they chose diesel...
I remember installing a Plessey digital PBX exchange in Croydon Council way back when they were 'new'. Someone decided to put the air-con on full as the room felt a bit warm. The next morning, all the air-cons were covered in snow, preventing any air cooling & the temperature was over 50C. The switch didn't crash though.
Yeah are they speculating on inadequate air-con or none whatsoever!!
A lot of the servers we use don't shut down when they overheat, but then we have monitoring and alerts if there are overheats as clearly there is something wrong when it does happen. Usually a case of broken aircon. It's quite possible that the incoming airflow was not being cooled (enough/at all) and when the warmer air was being passed through the drive arrays (usually quite hot babies) they weren't cooling them enough so they continued to heat further and further. A couple of disk arrays cooking themselves will do some severe harm to this kind of system I should imagine.
Well NHS IT is speced to have extra fans and cheap power supplies...
Now the whole lot will have to be scrapped.
I know of at least one make of mobo that has just the one fan controller to drive the Cpu fans even though it is advertised as dual cpu for redundancy! anyway the box gets hot the fans spin faster it draws too puch power and piff paff poof the magic smoke is released, and neither cpu fan works, only then do they hit overtemp and shutdown... D'oh.
First you build the server room, and make sure its structurally sound.
Then, you install the racks and servers. Turn them all on, make sure its all working. Let it run for a little while to see that everything is OK.
Then, as the coup de grace, you install the air conditioning. This makes the servers more comfortable.
Especially in health care, you have to take pains to do everything in the above order, or else madness might ensue.
I just hope this does not happen any more, some of these medical records can help save peoples lives. We need to move away from these hot servers with high Ghz CPU's and larger and larger disk stores that have become more prevalent in the last 5yrs.
These power and cooling requirements are not sustainable. We could be heading for more than a meltdown as seen in Jimmy's.
We need to be responsble IT people and make balanced systems from a power and cooling perspective.
Our air con in our server room at Notts City Hospital packed up a few weeks ago. Our entire building was stripped of desk fans to keep the room cool whilst the air con was repaired.
Quite funny actually, for me anyway, being a low level tech... The system admin guys didnt find it quite as hilarious
I remember a nice install of 48 v480's I did a few years back... After providing the power and thermal specs of the machines the company in question said "We'll look after that".
After replacing nearly all of the pdu's which had blown while firing up the servers they were finally in a position to fire up enough to make the aircon units literally "sweat" all over the racks... we had to throw plastic tarp sheeting over the racks just to keep the water off, and you can imagine what that did to the airflow :D
laugh? we nearly cried!
Our local college built a new extension, and put in lots of "nice" RM PCs, 24 of them in a horse-shoe, as the room was small these PCs (desktop cases) were nearly touching, they all had a fan on one side & a vent on the other, see where I'm going here?
Yes the PC on the far right wasn't very stable...
"The catastrophe hit on 9 September"
But presumably it had to sit on a trolley in the corridor for two weeks before a janitor came along and noticed there was a problem.
I work for the NHS doing 2nd line desktop support based at The Causeway in West Sussex.
I visit the server rooms hereon a daily basis to do various tasks, backups, patching ports etc.
All the server rooms I have had to visit so far have had very efficient cooling, most of them at least 3 air conditioning units. Although they do run a few servers with the covers off so air flow on those particular ones probably isn't too great, but it's within safe levels.
In our office we often joke about our colleagues being cowboys for whatever reason but we do a job and we do it well. Do the guy's at St. James's actually employ professional cowboys? It makes you wonder.
... to my boss... We have several racks hosted by a hosting company.
The server room gets so hot, due to bad aircon, that all us techie know to wear lightweight clothing, and the thinest t-shirts we can find when we go there.
We've told the boss... Many many time, but he didn't take any notice...
Not until the day 2 drives on the raid 5 database server died at the same time. Would have had a chance if the hotswap had had a chance to build up, but they went literally minutes apart. That's what I call tight manufacturing tollerences!
So, it's alright because the system isn't live yet, rather than alright because it was all backed up?
Perhaps it was, but in the same room. You have to wonder...
They had offsite backups and a good business continuity plan right? Or is that ward going to be out of action for a couple of weeks while the servers are reinstalled...?
Or is there an anonymous contractor in this picture somewhere? Name names...
I worked at a huge Facilities Management centre which at one time allegedly had the biggest machine room in Europe. The air handling units were real monsters. They balanced the room by borrowing a thousand or so electric fires direct from the manufacturer.
The AHUs were pretty reliable but when they went titsup there wasn't much time to bring the machines down. Security was breached once when we had to open every door on a Winter's night and let the rain in. That was fun.
Working in IT in a university I've seen this in action a few times. We moved into a new build a couple of years ago with our office next door to a machine room. Not only had the bean counters refused an extra £50k to flood-wire the multi-million pound building (surprise surprise causing problems within months of the building opening) they also only forked out for two under-specced air-con units for the new machine room. So when one failed...portable air-con blowing hot air into the corridor making it kinda tropical for a couple of weeks while waiting for the repairs.
Of course they wouldn't do that in the new multi-million pound build that opened this summer. Surely. As if. No. Oh yes - another new machine room reliant on two air-con units, each of which cannot cope on it's own. For an extra £7.5k we could have had three units of which two would have been capable of cooling the machine room in the event of failure.
Who knows what the cause was in this case but the bean counters have to be a fair bet.
From the description, this was meant to be a data storage unit, therefore in a *sane* world it wouldn't require multi-gigahertz-multi-core CPUs just to get the damn "operating" system booted. The required time to retrieve an x-ray image isn't likely to be measured in milliseconds, so stupidly fast (and therefore hot) HDDs wouldn't be needed either.
Tape backup units, of course, now they would up the price considerably... wouldn't want to deal with the shopping trolley of backup tapes that would be required for a full backup of such a system.
It really depresses me that we're paying for this kind of dumb-ass improperly specced, badly implemented (doubtless late) "solution" with our taxes.
... hospitals too warm. Perhaps its because of the server rooms.
They should run cold water for the heating system through these rooms, warming it up to reduce the loading required from the main heating system.
I'm looking at you, BofH. Take back the AC now, they need it!
I was thinking the same thing myself. I regularly by Dell MD1000's with 7.5tb of storage (15 x 500gb SATA drives), these are used to record surveillance cameras, so are reasonably speedy, and cost about £3250 a piece. The server to control 6 of these beasts weighs in at about £3k.
So we've got just short of 50tb of storage for £22,500. Lose a third for RAID 5 and you've still got storage for about £600 a terabyte,
So what exactly was in that room, a huge stack of gold plated SAS drives?
******* From the description, this was meant to be a data storage unit, therefore in a *sane* world it wouldn't require multi-gigahertz-multi-core CPUs just to get the damn "operating" system booted. The required time to retrieve an x-ray image isn't likely to be measured in milliseconds, so stupidly fast (and therefore hot) HDDs wouldn't be needed either. *********
Bit of background: Radiology images are stored in DICOM format, these are large! A plain film single XRay will be up to 10Mb in size, new CT and MR scanners are putting out image sets that can easily be 1Gb in size. While the radiologist producing a report has to see the full, uncompressed image, most clinicians in outpatient departments and wards don't require this level of detail as they are viewing the image in conjuction with a clinical report. As the time taken to throw gigabyte files around the (in some areas) aging 10Mbit network is unacceptable, the images are converted to JPEG before viewing on a workstation. This is done in real time as the images are requested and that does take Multi-Ghz, Multi-core CPU's.
Also, these images, in our system, are stored on a shared SAN (cache) / CAS (archive) system that many other hospital systems use. Therefore the disks are specced to the highest requirement of those systems.
Having said that, if you install a fast hot server, whatever it's for, you must provide the correct environment for it! I would guess that the computer room in question also held Exchange servers, hospital information system servers etc, but that really doesn't make the headline as interesting, does it!
PS - While the implementation of PACS as an (eventually) nationwide system is new, the idea isn't. Leicester General Hospital has had a fully functioning PACS system since 1999 :-)
After working with government organisations let me show you were these figures come from
£ 900,000 - Meetings tea and cucumber sandwiches for clueless bureaucrats
£ 100,000 - Actual labour and equipment costs
...How long it would be before the BofH came under suspicion.
When the AC failed in my very modest server set up (two or three RS6000s, a Novel server, firewall, switches, routers, pabx and stuff, but packed into a fairly small space) the chief bean counter wondered, for days, if the thing was really necessary!
In fiction, people resign. In real life we have to eat --- but I did write the MD to the effect that I refused to be held responsible for the company's data until this was fixed. It worked too!
but due to the official secrets act I can't discuss what government agency I work for but let me just say that they can't get the air conditioning working propley for people let alone computers, the other week we spent days working in swealtering, boiling hot conditions while they "found a spare part" for the air con.
If the UK government can't keep the air con working within legal "comfortable" limits, I dread to think what their servers like.
"Our local college built a new extension, and put in lots of "nice" RM PCs, ....."
Elonex 'all in ones' - eggs on top, bacon on the rim and sausages in the cd tray = instant breakfast.
Building a server room for a friend. Has 30 dell 2u boxes. rule of thumb: they draw 250 watts on power up, 100 to run.
250*30=7,500 watts to power up after an outage, divided by 110 (it's the us...)= 68amps. Can probably do it on 3 20 amp circuits, but will use 4 for safety.
100*30=3,000 watts continuous power use. He bought an air con that removes 1000 watts. I asked him where the other 2,000 watts were supposed to go, got a blank look.
I suppose he'll figure it out, or I will have a nice, warm place to go in the winter, at least until the servers have shat themselves. ;-)
All goes to show, it isn't rocket science here -- just too many managers and too little thinking.
from my experience the Trusts have they're own server guys and get contractors to roll out the desktops + UBER LCDs for viewing the PACs supplied xrays....
So whats new government employees / dead wood who just coast and actually know nothing about what they're doing, they'll never get sacked unless they commit murder... even burning out million quids worth of equipment won't shift em
NHS Trust IT Staff are no better then tit of life sucking Civil Serpents if you ask me
Probably by the end of next year, over 80% of the images shot in the US will be digital. Films are going the way of the dinosaur. And the quality of the images is increasing at a very high rate. Which means that the bit density is increasing, which means that storage requirements are going up. So those 6 TB (with RAID5) boxes will rapidly become too small. Medical imaging systems are about to go through a very, very painful stage.
Then there are those billions of films still out there. HIPPA requires that they be kept for 7 years or, for minor, for 7 years after they reach the age of majority.
On reflection, I find it hilarious that a country that drinks warm coke (room temp, that kind of warm) finds the slightest bit of heat 'sweltering'. :)
With the heat these rooms are generating perhaps they could move some more hospital equipment in. Incubators, humidicribs, bio-culture growing racks, patient solarium, staff gym, etc.
...... is rocket science. I've lost count of the number of server rooms that I have built (the last for the largest single-occupancy building in Europe). The "science" behind a proper and secure environmental build is as basic as it can get. All thats needed is to take account of the heat output specs and design around that. It does not matter *what* it is, just how much heat it can generate. For example. take account for, or don't assemble in the first place, any *hot spots* in the room, account for external forces (path of the sun on the building if glass walled), allow for redundancy, that kind of stuff and Robert is your father's brother.
That's the technical bit done. Next up is keep the Finance Director and his team in their cage. They are not the experts in this stuff and should not be allowed to dictate the design. Engage the MD or CEO at the outset and present artificial financial constraints as a Key Risk in the project. If he doesn't understand that the project is pretty much in trouble from the off.
A properly run project would not allow stupid disasters like Jimmy's to occur. But then what Government project has ever been "properly" run? Ineptitude and corruption abound, the results are entirely predicatable.
Does anyone else feel a slight tingle in their belly at the thought of £900K's worth of cucumber sandwiches?
I always knew I was destined for politics (or middle management)
Accidents happen, this was an accident. We should stop bashing the organisations involved, the NHS is there to help us we should help them. I think we all owe the NHS a lot, I have had 2 broken arms fixed by them.
This is an opportunity to start thinking about data and cooling.
We can be smarter in the future and store the PACS data on low temperature storage e.g. tape. We need a smarter storage hierarchy were data is on disk for x days then goes to tape for months or years. No point paying for cooling spinning disk that no-one uses.
I know this is a shameless plug, but we have a spanner/tool that fits the job, people may just not know about it. SAM makes the media transparent and your applications things all data is on disk. You can take a copy and send it to tape on another site which does not need the cooling of disk. Disk is good for short term storage or active data, tape is good for inactive data. SAM glues it all together so you never know.
The solution is here: http://www.sun.com/storagetek/management_software/data_management/sam/index.xml
An example is here:
...are usually fun to deal with, and as far as technology is concerned, are often responsible for costing a department more money than they think they're saving it.
A few years ago, when I was kitting out a server room with a backup solution, the resident bean counter refused to pay the £3k necessary for the backup tapes, despite the fact that we'd been *given* the tape robot free of charge to our department (but without tapes, of course). Didn't appreciate the business risk assessment I gave him, and just said "No, sorry. No budget."
So I wrote back to him (Cc:ing my manager and his manager, just for good measure), and said he answer was acceptable - provided that he was willing to *personally* assume all liability (and costs) in the event that company data was lost. We were an R&D centre, and that data was worth about £250k per week.
What do you know? I got the purchase order approved before I even left for the day! :)
The lesson I learned is simple: If you make bean counters accountable for the purchase orders they refuse (and you provide them with an accurate idea of what could go wrong), you'd have a lot of businesses run far better than they are today. As an engineer, there is no more frustrating phrase than "I told you so, but you didn't listen."
... that the page had an advert for IBM Cool Blue Blade servers on it??
We're setting up a new server room in our basement. It's partly naturally ventilated but tends to be cooler than the ambient temperature outside.
Our MD doesn't want to install *any* aircon, just have the systems in a room with the doors open and a fan blowing. Outside temperatures sometimes reach 40C here.
In another server room it's in an airconditioned office without dedicated aircon (just the general office aircon) and is kept "cool" (~30C) by leaving the door open. Recently we've had complaints about noise & heat by those sitting near the open door. His response was to buy a floor-standing portable aircon unit and ask us to set it up to vent to the office corridor and close the door...
This for a room with around 35 systems in it - call it 5000 watts or so power draw.
Fortunately we've had a sanity check.
Well done for winning the most random post award! Wtf are you on about?
I used to work in a Boots store just after leaving Uni. For some reason the server for the store had been put in an office (Or the office had been put in the server room). People would walk in there to work and find it was too cold, so switch the AC off, then complain about how the system was so slow...
Also, I would like to point out that Bean Counters, dispite what people think, never have a say in what is spent. They will tell bosses how much there is to spend (I.E. how much is in the bank) and then leave them to it.
So a single xray film is a little over 1 MByte in size (You siad 10Mb = 10 Mega bits)
Some techie you are when you don't know your bits (small b) from your Bytes (large B).
I wouldn't trust you to change a battery in my daughters dora the explorer doll.
is that your aircon status panel cannot fail in a condition you cannot spot.
worked in a place with 5 Vaxes in a room, spring came, i thought it was getting a bit warm especially behind the 8650. told sys admin, he looks at the aircon panel, no failure lights were showing (3 separate aircon units)
anyway 1st really warm day of the year, cant log in to anything, mosey over to the server room, the loading doors were wide open and the sys admin was going around spinning down the disks. the console printers were going mental as each machine lost then remade contact with each other (and it would report every other machine in the cluster doing the same thing)... eventually full power down.
i was right, it was getting warm. basically 2 of the units had bust but the panel was bust too, so the failure lights didnt show. on the 1st really warm day, the final working unit found it all too much and gave up the ghost....
I'm not sure why you find it funny that we drink coke at room temperature -- since room temperature here is probably fridge temperature in some states of the US ;~).
A few years ago.. I implemented a software system in a new state-of-the-art web-hosting centre for a global leader. The centre was build in a former print-works in London Docklands and the design & decoration of the foyer alone cost a £1 million. The only problem was that (being a former print-works) there were not as many solid floors as the Data Centre designers expected.. and when lots of hole were made for cabling the whole place became much more porous.. froze the offices and baked the servers.
A couple of years ago, I worked at an Investment Bank that needed to open the data-centre doors and use fans to blow hot air into the car-park shared with the Investment Bank next door.
Lets put aside the (very) amateur carping and remember that there is nothing more mission critical than life & death. £1 million is less than the compensation claims if some peoples children died because “the server shutdown because it thought it was hot” or “the courier with the tape of your patient records has been delayed”
Spare a thought when you criticize the CAS on-line disk archive (that power-down when not used) that there are people wandering around with fried-nuts and fried-ovaries because the only “on-line” store was to do another x-ray.
Cut them (I’m not involved with NpfIT) some slack, with the latest hot-boxes cooling is a big problem.. the only difference is that they are under the spotlight
I've been in server rooms that were so hot, the servers kept shutting down. Air conditioning had been installed, but appeared not to be functioning correctly. The reason? The aircon was pushing cold air into the room, but there was no ventilation for the pressure to escape. So, aircon was trying to push cold air into a closed room, the pressure inside the room built up, so no more air could be pushed in, resulting in aircon not doing anything except wasting electricity, and servers overheating. How many server rooms are incorrectly set up like this? Could this have been the case at Jimmys?
I think we already accept far too many public IT snafus. How much treatment or how many lives could that wasted money have saved ?
Stick core IT systems e.g. things that really do mean life and death with third parties that specialise in data centres rather than keeping your servers where you can see them and get on with looking after the patients.
It might have been fun planning and building a spare room into a "data centre", but its an empire that delivers zero value to the hospitals customers.