Feeds

back to article Are you being robbed of sleep by badly designed servers?

How should we design the servers and end-user computers of the future? The construction of my testlab has given me the opportunity to play with technologies I normally wouldn't be able to get my hands on. The "advanced" features in them – standard fare by now for large enterprises – have caused a measure of introspection …

COMMENTS

This topic is closed for new posts.

Page:

another advert for supermicro?

Why is that these articles from Trevor all seem to read like adverts for supermicro? This one starts off masquerading as an article about remote server management. Whilst this kind of stuff might be new to Trevor I can't be alone in thinking this is something I had on the Dell servers I bought in 2003 and every other server I've bought since had either a DRAC card or an iLO and I don't remember paying extra for any of them. In fact I'm pretty sure that the two Dell 4400s I had which were dual Pentium II Xeons and looked like the Jawa Sandcrawler from Return of the Jedi even had DRAC cards.

13
2

Re: another advert for supermicro?

Indeed, Supremicro was late to the LOM party. And, at the time when I was in this business, much less reliable than the competition.

3
0

Re: another advert for supermicro?

It's quite sweet in a way, watching young Trevor get excited about discoveries of things that experienced Reg sysadmin readers have already been using for many years.

9
0
Gold badge

Re: another advert for supermicro?

Enterprise vendors have had this for ages, but lots of folks who make "whitebox" kit (ASUS, Gigabyte, Tyan) don't. Or if they do, it is often quite a pricy extra. We're finally at the point that SMBs and bulk-buy folks using whitebox servers can buy IPMI-equipped stuff without pushing virgins into volcanos. It's time we stopped buying the crap that doesn't have lights out management. Send a message to companies like ASUS that if you market a "server board", it isn't okay for it to lack IPMI.

9
1

Re: another advert for supermicro?

Just checking; are you referring to the full remote console iDrac/iLO, where you can see even a gui console (Windows), mount ISO's/floppy images, have mouse control, etc etc? I was under the impression that was an iDrac Enterprise set of features, which have a decent cost.

1
0
Gold badge

@dz-015

The difference is price. The cost of this enterprise-standard tech has come down enough for there not to be an excuse for its inclusion in even the most basic of SMB gear. The tech is mature. The pricing is a transformative element enabling far wider adoption than was possible even two years before.

2
2

But security?!

I admit my experience with light out management is dated, I am in development for the last seven years. Still - ipmi is rather problematic from the security standpoint. No audit (and usually no source available), no updates if/when problems are discovered, limited choice of connection/authentication methods. This boils down to the need for a well-maintained gatekeeper machine on premises, isolating your management network from the Internet. Which means extra cost in equipment and support, and partly defeats the purpose because you won't be able to remotely handle that machine if it fails. We could afford such arrangement for a datacenter of a few hundred servers, but if you only have a few machines on a remote location, it's uneconomical.

1
0
Gold badge

Re: But security?!

Remote gatekeeper = router /w VPN. Repair for router = PDU with network port on main network. Worst case: someone can reboot the management network router at will. Problem solved.

1
0
Bronze badge
Boffin

diurnal cycle

"I disagree that the day starts at some pre-determined hour simply because it coincides with the rotation of our local patch of mud to face some fusing ball of hydrogen plasma about eight light-minutes out. "

Mr Pott may disagree but his brain/nervous system/sensory system is wired up to synch with the ball of mud's rotations. Not hippy stuff. just how meat based processors work. Local partners?

3
0
Gold badge

Re: diurnal cycle

Actually, I have sleep phase disorder. Left to my own means, I naturally fall into sync of sleeping at 4am and waking at noon. It's certainlh timed to the passage of the evil daystar, but significantly offset from the middle of the bell curve.

4
2
Bronze badge

Re: diurnal cycle

I think the medical name for what you're suffering from is "teenage"...

2
0

Re: diurnal cycle

It's not. If you lock people in the dark, for instance, they drift to a different rotation cycle...

1
2
Gold badge

Re: "It's not."

Surely the fact that we drift away when the sync is absent merely confirms that we are synchronising with it?

1
0

Circadian rhythm

Look up circadian rhythm.

If left without external clues/influences, I believe humans will generally fall out of sync by running at a period of a little over 24 hours. Normally, clues like that big fusion reactor coming into view each morning reset the clock each day so we stay synchronised to the rotations of our lump of rock.

Obviously, not everyone is wired up the same, so it does vary a bit.

0
0

Re: diurnal cycle

Ssleep phase disorder! I have the same pattern, I didn't realise it had a name. So I've learned about remote server management and circadian rhythm disorders in a single article. :o)

0
0

This post has been deleted by a moderator

Anonymous Coward

Re: Badly Designed Server = Server running Windows

Eadon, you may need to have your dosage reviewed.

I had a HP box with integrated iLo running Linux and OpenExchange for years, and the iLo lived on a separate network which was routed through a gateway and a certificate protected VPN. No Windows involved, so no need for you to re-iterate your threadbare worn list of why you don't like Windows (we got that by now).

I'm not spectacularly fond of Windows either, but I don't see the need to mention that under every article, especially when it's barely relevant to the topic at hand.

14
0
Linux

Re: Badly Designed Server = Server running Windows

Eadon, what is your problem? I have just read through the article again, and at no point does Trevor mention which operating system is running on his servers. It is entirely about remote managment.

I strongly suggest that you up your dosage of dried frog pills. In the meantime, have a penguin on me, maybe it will calm you down.

16
1

This post has been deleted by a moderator

Stop

Re: Badly Designed Server = Server running Windows

Oh Eadon just shut the f*ck up will you!!

I'm getting sick and tired on your endless droning around Linux. We get it, you don't like Windows!

No the topic wasn't specifically servers and clouds it was point hardware management. Which occurs more often in smaller set ups where its usually 1 man and his dog trying to keep the lights on. Operating at cloud scale (a la OpenStack) brings its own challenges around HW management that I'm sure you are obviously aware of and that has no relevance to anything in the article (with maybe the exception of IPMI).

In addition trying to link an OS to a perceived increase in HW failure rate (and therefore associated HW management) is at best tenuous with no basis in evidence (do you have any hard statistics you could share?). It smacks of a university level understanding of technology, that being the fanaticism that comes with lots of principle but with little real world experience!

For the record I spend most of my day job in and around OpenStack based platforms, the most popular request in the past few months has been how to run Windows based VM's in that environment. You may ask why and the answer is simple, because people want it!

10
1
Gold badge

Re: Badly Designed Server = Server running Windows

The IPKVM image shows a server running ESXi...

2
0
Trollface

Re: Badly Designed Server = Server running Windows

He's implied that it's ESXi- I don't know of any other OS that will cough a purple screen of death when it panics.

And Eadon, you need to put down the crack pipe. Or maybe start playing around with a server 2008 R2 box- done up properly (i.e. on solid hardware, and using signed drivers or even built in drivers) the OS is pretty damned reliable at this point, on the same level as your beloved linux. Admittedly, in the four years I've been admin of our ESX stack, I've seen the hypervisor purple screen on me exactly once. The fault? a perfect storm of a flaky NIC driver (HA HA! Linux has them too!) and a bad packet on the 10 GbE connection causing NFS to go down like a Clinton intern, which took the hypervisor and all the machines running on the box down with it. (On a side note, if you are running ESX/ESXi 4.x and using the Intel 10GbE network cards, get thee to VMware's site and install the updated drivers- that will fix this issue.)

Troll icon, because hey- If I'm a gonna troll, might as well go whole hog.

2
0
Facepalm

Re: Badly Designed Human = Eadon

There we go; I've just corrected your equation.

1
0
Mushroom

Re: Badly Designed Server = Server running Windows

Hey. Get into anything above a hundred machines, or an organisation above a hundred people, and on Linux, we too can have closed-source, locked-in, expensive, insecure, buggy shit. And we can even have crappy virus scanners on our penguins that eat up half the CPU no matter how much CPU you have, to protect our Linux boxen from all known Windows viruses.

Unless of course you have Manglement that just lets us techies get on with it. On second thought, cancel that. It's stopped being funny.

0
0

Re: Badly Designed Server = Server running Windows

Speaking as a developer, show me an alternative that comes even close to the productivity of the MS stack and I'll switch. For most small companies, the cost of developing software is greater than the cost of hosting it, so even allowing for more hosting resource it saves money.

Obviously, this changes for larger companies with high volume websites, I wouldn't try to convince Twitter to run on .NET. :o)

1
1

iLo's and DRAC's have been a standard part of my life for longer than I now care to remember. They're as much part of a "standard build" for servers now as a power supply or NIC is. They've saved my bacon enough times, and now that Dell are up there with HP for general reliability and stability on those things I'm far happier. I'm not really a desktop person, so can't really comment too much there.

My personal annoyance has always been with HP: if you had a Windows server, and if you wanted to be able to KVM properly to it and see the GUI, then you *had* to buy the optional iLo "Advanced License" - that just totally smelt of price gouging in my view. It still rankles.

6
0
Anonymous Coward

My personal annoyance has always been with HP: if you had a Windows server, and if you wanted to be able to KVM properly to it and see the GUI, then you *had* to buy the optional iLo "Advanced License" - that just totally smelt of price gouging in my view. It still rankles.

Yup - ditto here. I *hated* that, it got really annoying when I forgot to set up a Linux box so it remained in *standard* text mode, as I was by that time several miles away from where it was installed. Thankfully I could bounce the box and grab the boot up process to clean that up. Not impressed..

1
0

Yeah that was annoying, but the "nice" thing about it is if you were really stuck you could get a trial and bung a key in and it could save your ass in a few minutes. And if you were naughty and the trial had run out there were keygens (not recommending/advocating their use!).

However, if you had a Dell equivalent without an iLO you were screwed. You'd have to pay for a hardware upgrade, get downtime on the server, unrack it etc. So in that instance the HP license was better. Be even nicer if it was free and standard though...

1
0
Silver badge

"My personal annoyance has always been with HP: "

There's a pretty effective solution to this problem: Don't buy HP.

IPMI is bloody handy but there are still a surprising number of crappy implementations around (including what Supermicro used to flog)

It's a little worrying to me that Supermicro is rapidly becoming our "go to" manufacturer for most servers, not least among reasons being that they refuse to certify hardware ex-factory for Linux distros (Their blade shelves have a few points of major suckage too) Intel/IBM/HP/Dell seem to be sleepwalking into irrelevance.

0
0
Silver badge
Trollface

So much complexity, so little time

stop-a

...

go

9600b is fine in case for some reason you don't have 3g either.

Youngsters... always trying to re-invent the wheel by making it square!

3
0
Silver badge

It's good to see

that what has been standard practice in the Mainframe/Midrange platform market is finally becoming a reality in other architectures.

I'm an IBM pSeries and Power person, and we have been able to do remote IPL, console, and configuration/management. for years. OK, you paid a premium for some of the features, but many of the base capabilities have been built in for well over a decade, and now includes IVM/PowerVM. The RS/6000 F40 had a service processor when announced in 1996.

It does help that most of the system admin can be done from a command prompt, though.

One of the customers I worked at had a majority of their critical servers in lights-out, mainly unattended sites scattered around the country from before the Millennium..

1
1

Re: It's good to see

Its actually been standard in pretty much any kit from an enterprise vendor for a long time either x86 based or not (HP, IBM, Dell etc). Even on their low end gear, typically SMB stuff.

What is new is the white box manufactures have finally caught up, so those who are used to building servers themselves can now get the features at a price point that is right for them.

1
0

Doesn't everybody run VMs these days? We ditched the "100s of little boxes" model years ago and run a mix of big Dells, with DRACs, and Supermicro with whatever they call it. The VMs all have VNC console access.

My biggest concern is the lack of security on the remote management cards - typically they don't even have IP filtering.

1
0

Sounds to me like your buying the wrong brand, HP have had basic Lights out via WebApp for nearly a decade at least as standard even on the lowly DL140 G3 (G5) servers we have, which set you back under £100 on Ebay at auction, and can be fitted with decent memory and raid (a DL160/DL360 G5) provides the disk performance by default usually for a little more, and 6 2.5" bays.

2
1
Anonymous Coward

DL120 is on UK website from £499, but it's an additional £399 to add full lights-out management. licence. * "currently may not be available direct from HP"

0
0
Thumb Down

Well then they are ripping you off, I put together number of these servers with battery backed raid and all can be done for under £250 if you know how to bargain hunt (I did say auction). Of course if you go to a fixed price reseller who might say send soliciting emails to non tech savvy businesses, you might pay over £1000 for a G5 server kitted out with stuff.

The most powerful a DL385 G5 with 2 VT-x capable quad core CPU's, 16GB of ram, a VT1000 quad port network card, a P212 with 512MB BBWC and a P400 with the same, 4 10K 300GB SAS disks, an additional single boot disk for the hypervisor, this cost about £450 a year ago.

The cheapest DL140 chassis without disks I saw sell, had 2 Xeon 5160's and was £50, I've picked up a working P400 from the US for £12, you can even cut costs on the batteries replacing the cell and keeping the management electronics if you really want to.

The only think you need to be careful with is the disks, I've had 1 fail in 18 months from about 15, but I also learnt fast not to buy disks from people with ratings of under 50, and check every disk (3 Ebay cases issuing a refund), as some sellers sell working disks which fail on a full scan (this happened twice)

I've also had one dodgy server which went very cheaply, which I didn't check until it was too late, and all the signs were there that it was bad before I completed the transaction, along with a £12 replacement fan board expense.

Of course, the electricity is another matter.

If you care about the cost of an enterprise iLO license and are using multiple machines, then you can save a bucket load more money than that by buying pre-built usually lightly used entry level servers, replacing the disk infrastructure with something more capable, to build a small datacentre. Now though I expect something with full virtualization support (the i7 Xeons or AMD's not quite equivalents) would be a better target, at probably less than a white box.

0
0

Remotely flashing BIOS?

Do people really do that, on boxes that matter? I wouldn't fancy it!

6
0
Thumb Up

Have an upvote.

Is this a REAL sys admin extolling the virtues of bieng able to remotely flash a BIOS?

Sure, 90odd% of the time you'll be fine. you don't get a good reputation though when your response to one of the f-ups is "oh, i'm five hours away from you, see you lunchtime"

0
0
Gold badge

It isn't the end of the world if you bork a node in a cluster. But in the past three years of updates remotely, I've had 100% success on over 250 flashes. Good enough for me to consider it solid for most use cases.

1
0
Alert

These days what scares me isn't flashing a BIOS (it's a lot more reliable than in the 90s and every system I've come across also runs at least one verification check to ensure it worked), it's flashing other firmware; especially storage. Egads you are taking your life in your hands when you try that even sitting next to the machine.

0
0
Silver badge

System watchdog?

Quite a lot of ordinary motherboards have hardware watchdogs built in, for example the w83627 and similar chips that provide hardware monitoring (voltages, temperature, fan speeds, etc). This can provide a last-resort method of rebooting a sick server if you don't have lights-out support, but only SSH access.

With Linux you can add the corresponding watchdog driver module (they are black-listed by default in Ubuntu) and then the watchdog daemon and configure it to check a few vital signs. Typically you would check the load averages are not stupidly high (say over 5 per CPU core), maybe that rsyslogd is running, that you can run a simple bash script, etc.

If any of those tests fail then you get a moderately orderly reboot, and the hardware watchdog makes sure you get a reboot even if there is a kernel panic style of fault. Brutal perhaps, but it gets the system back up and hopefully either all OK again or at least you can SSH in to fix it.

1
0
Thumb Up

Re: System watchdog?

This is a rather interesting idea...

0
0
Bronze badge
Thumb Up

Network Connected PDU's

A few years ago, we had a linux box that would occasionally lock tighter than a nuns c....hurch donation safe.

It was in a distant far flung place called London, and rather than someone driving up there to give it a quick kick in the PSU, we bought a PDU with a web-server built in. If the box fell over, one of the support techs could dial in remotely and flick the power off and on. 5 mins later, you could be back in bed, rest assured that your "hour" of billable time would get signed off as a job well done.

Then they bought proper appliances and servers and the problem went away.

Oh the good old days.

2
0

Managed PDUs and a serial console

Managed PDUs and a serial console server, if you're managing more than a couple of servers the actual cost per server isn't too bad. Admittedly not that helpful if you're using windows but if you're using linux it can be a life saver if you break your network connectivity

Also even quite old servers generally support serial bios....

3
0
Bronze badge

Re: Managed PDUs and a serial console

windows has had serial console support for a while now.

http://www.msexchange.org/articles-tutorials/exchange-server-2003/monitoring-operations/Windows-2003-Server-Emergency-Management-Services.html

I have not personally tried it. But I do remember EMS support going all the way back to some Cyclades terminal servers I had in 2004. I'm sure it's gotten better in the last 2 generations of windows servers.

1
0

Re: Managed PDUs and a serial console

It's there but other than setting an IP or rebooting it, you're limited to the windows command prompt which isn't particularly helpful...

0
0
Boffin

Remote PDUs + serial console won"t help if the system is aborting in the middle of the boot sequence, e.g. due to fsck.

The reason Google, Facebook and other hyperscale companies don't provision LOM on their servers is not so much cost as the fact their ops model treats individual servers like cattle vs. pets. If a server dies it is automatically failed over and the FRU is the server itself.

0
0
Bronze badge

If it is a linux box (for example) and it is setup correctly then yes a serial console will be fine to recover a system if it is stuck from a failed fsck. In most cases full bios is accessible (3ware bios for as long as I can remember did not work with serial console), linux boot loader, full linux kernel messages, single user mode works fine, multi user.. whatever. You can even access the magic "sysreq" sequence over serial port in most cases. Serial makes good for logging too, the terminal servers can often send data going to the consoles to a syslog server.

For DRAC and HP iLO at least you can normally stick to serial consoles (virtual serial ports) to access linux systems w/o having to pay for the enterprise/Advanced license for those that don't need things like virtual media.

I agree with other posters that this article is quite weak -- perhaps would of been good to cover solutions for such types of systems that do not have integrated management in them like someone mentioned Raritan. It's not cheap, but it works fine. One deployment I deployed raritan on top of remote serial consoles(many many years ago), I had one raritan drop in each rack that on site people could connect in the event serial console was not adequate.

I have a friend who runs a big lab at MS, all HP stuff but they too use Raritan KVMs (no integrated PDUs as far as I know) instead of the integrated iLO. It's just what they are used to.

1
0
Boffin

iLO, DRAC, and Managed PDUs, oh my.

While all of our dell servers have a drac built on them*, we don't use them at all. What we do use is a managed PDU/KVM combination that Raritan makes- while the units we have are quite pricey (the controller itself is something like 16 grand retail for starters!) it's definitely worth it when you have a server that's shagged itself and needs a kicking.

* IIRC, they are standard on all poweredge servers at this point. I could be wrong, though.

0
0

IPMI how I wish it was old school

I thought IPMI was a thing of the past but some vendors still use it. Dells cloud systems still use IPMI as they only have a BMC installed on them. On the plus side their BMC can be logged into and they have a nice GUI like a RAC card. I would still rather have the RAC or should I call it iDRAC with a separate NIC port to separate the management from the rest of the network. One cost cutting I would prefer never happened!

0
0

Page:

This topic is closed for new posts.