An effective data centre is more than just some racks of servers with a bit of networking and storage attached. It needs to be versatile, easy and quick to flex and reconfigure, both manually and automatically, and it needs to keep up with the demands of the applications that run there. Historically, though, many of the …
Bookmark worthy article
Top notch advice.
One of these days I may be fortunate enough to work in such a place. Until then I'll keep the CYA skills updated along with the CV for when they inevitably start to blame the peons for the management mistakes.
Re: Bookmark worthy article
concur fully. I have worked on a site where all systems, database, applications and developers were under one manager on what was initially regarded as a deathmarch project. As the article described, problems were analysed and fixed very quickly because subject matter experts with appropriate access were within meters of each other. The other bugbear, process control was managed intelligently. Changes were created, assessed and approved/not approved within an hour in emergency. Emergency fixes were not persecuted. I think it was the highlight of my work life. The customer got a reliable modern working system with an upgrade path that would improve performance and reliability significantly as well as proper testing environments.
Unfortunately current management fixation on process, caused by belief in the reductionist fallacy means teams are all too often remote, siloed and hampered by process droids of the dullest kind ensuring everything happens slowly. Changes take weeks and performance is based on paper processed, not user satisfaction.
VMware Snapshots? For real?
"And if you are being really sensible you will do backups at hypervisor level anyway, instead of agent-based ones on each virtual machine’s guest operating system."
Re: VMware Snapshots? For real?
I do not know VMware Snapshots, but I'm assuming that they work like other snapshot systems.
Blockwise filesystem snapshots can have a place in regular backups, but only really if you limit the time you want to go back to the number of snapshots you keep. And this is determined by the amount of change in your systems and the amount of storage (usually disk) that you are prepared to keep back for the snapshots. In addition, they are probably useless for disaster recovery, unless you are maintaining cross-site snapshots (I don't actually know if you can do this, but I would guess that if you had cross-site mirroring, it would also be possible to keep snapshots on your other site).
If your backup requirements are longer term, or require recovery of individual files, then an agent based backup scheme is about the only way you can satisfy the requirements, IMO. This is especially true if you have a heterogeneous environment.
Of course if you are backing up the C: drive of all of your identical virtualised Windows boxes, then there are probably huge benefits in just backing up one copy of a de-duplicated, shared image at the de-dup'd level, rather than agent based backups of each system. But that is a particular system deployment method that does not match all requirements.
Re: VMware Snapshots? For real?
"I do not know VMware Snapshots"
Then you'd probably be best not commenting on them. VMware snapshots are the work of the devil and should never be kept beyond 24 hours unless something really catastrophic makes removal impossible.
VMware snapshots are backward looking, and designed to be discarded when testing is complete, so the new data is put in a separate place which must be checked prior to accessing the original data. Two snapshots and you check two places. SAN generally uses forward looking snaps where new data "replaces" the old data in the original image so performance is generally unaffected until you perform a restore.
Yes, snapshots do have a place in backup, just not VMware ones. It's also arguable that you want your data out on the SAN as well rather than in a VMDK. Encapulation isn't always a good thing, and large data sets are more manageable using SAN tools directly on many SANs. NetApp is the poster child for this, where any VSS application such as Exchange or SQL lives on the SAN and is backed up directly and separately to the VM which just houses software.
Re: VMware Snapshots? For real?
NetApp has a nice vCenter plugin that handles snapshots on the filer. It does require some extra licenses that you may not already have purchased to use all the features (notably, single-file restore).
My choice of the icon should be obvious to anyone who's dealt with NetApp or Brocade licensing. Especially Brocade. Bugger port-based licensing with a bloody spear.
"So where you can do the same thing in two places, get the vendors involved and make sure you go for the best combination of options" . . . "Your particular combination of systems will almost certainly have unique interactions that the vendors’ specialists can help you with."
Good article, but you forgot a step (as many do): be large enough and buy enough kit that the "Vendors' specialists" deign to even recognise you as a valued customer.
"At the very least . . . you need to have the infrastructure team under one roof, preferably headed by a single manager."
Again, great idea but missing a step: have a CIO/CTO who understands the importance and benefits of such an approach and is able to manage the egos of said 'key service owners'. Also, along the same lines: make sure the CIO/CTO has hired people who are more interested in systems that work (and work well) than in protecting their own interests.
You are correct of course - the only way to get complex, inter-related systems to function together (let alone efficiently or, saints preserve, optimally) is to have the complex, inter-related employees controlling those systems also function together.
Unfortunately, working in such environments (at any level) is a rarity.
"everything is going to go at the speed of the lowest common denominator"
Something to keep in mind.
Re: "everything is going to go at the speed of the lowest common denominator"
I actually disagree with this. In my experience on networking and storage it's entirely possible for things to go slower than the lowest common denominator. For instance if you misconfigure link aggregation of two GbE NICs, it's possible and probable that you'll get nowhere near 1GbE from it due to flip floppery in the network yet the slowest component might still be 1GbE.
The article was great though, if only more people would take note of the lessons.
I rate this article - bang on
This is pretty much what I do - that manager pulling it all togther. And all what is written rings true with me
But I start form the application and work down. Understanding the app (synch, asynch - scale up or out - is it disk, cpu or mem hungry) is vital to undertsanding where to put it and how to deploy it.
Also understanding customer expectations against what they have paid for. Running a few hundred customers in the same datacentre(s) means you can play one off another if you are good at monitoring what they are doing for real
Writing monitoring reminds me of the need to monitor the application, which IMHO is most often missed. Knowing if the app is working to standard is more important than knowing if a disk queue length has gone long
Criminally Insane Indeed
"You can even turn on data compression right at the top of the stack on a Windows server, but of course that is only for the criminally insane."
This absolutely made my day, but you sir owe me a new keyboard. Tip of the hat to anyone else who's survived this nightmare and lived to tell about it.