On Doomsday Weekend we completely replaced our Windows domain. It was a miserable experience. It’s hard to describe how much work is involved in replacing a mature domain; certainly more than I had anticipated. It's even harder to explain the hell to non-sysadmins. On the surface, the transition from the old network to the new …
The term you is looking for...
...is a clusterfuck....
Doing a big bang move is never a good idea, unless you are brave or stupid.
Whon earth did you decide to roll out new printers as well... bloody xerox's at that!
Old printers were giving up the ghost. (Some were pushing 10+ years.) We needed a new set of higher volume printers. Had the things arrived on time, the old printers would have gone away with the old network, and the new printers would have started working on Monday morning with the new network. Alas, this was not to be as the – get this – paper trays where backordered.
Yeah. Paper trays.
Never try to install new printers at the end of the month
doubly so at the end of the quarter: quadruply so at the end of the year: it is at these times that the sales wonks are busily trying to hit their targets, which involves having boxes shifted from the warehouse into a site with scant regard for whether or not they are actually capable of being used at all, let alone providing the sort of functionality the end users actually require, just as long as said sales wonks can get their bonus.
Oh, and always get one of them in early on trial so you can shake out the driver/Word macro/user idiocy bugs in advance...
The printers were largely upgrades to an extant lease of Xerox printers. Some were replacements for really, REALLY old workcenters (Think N32s, N40s, etc.) No futzing with forms was necessary, and I did spec delivery for the middle of the month. (Doomsday Weekend took place on the 20th of August.) Negotiations for the whole thing started at the beginning of August.
The nice lady seemed to be trying to do everything she could to get the widgets on time, but sometimes expletive happens. Sadly, it tends to happen at the worst possible time…
I've never understood...
...why these things are rolled out in such short time scales.
I work on smaller networks than even this, and sure they are a lot less complicated, but even I know that things never go to plan. The last 3 rollouts I have done, usually involving replacing the primary on site server, have all been phased in slowly. 1 service at a time giving each a chance to settle and for problems to surface, all the time with a rapid roll back in mind, should things go wrong, and all the time maintaining service to users. If no one notices that anything has changed then I am happy. Pulling 80hrs straight isn't my idea of a job well done as I wouldn't see any extra benefit from doing so.
I guess you are a contractor with a silly hourly rate - in that case - good job!
Full time regular staff. Salaried, and I don't get overtime. There wasn’t enough gear to run both networks in parallel. We had to pull of the network change without new equipment; almost everything is virtualised, so the “new network” was largely a set of new virtual machines. Given that, I don’t actually see how you can move from one forest to another, migrating all services “one at a time.”
I would love to say I am totally in charge of making all such decisions, but unfortunately I do have to work with what I’m given. I do largely get to buy what want, but I have to justify it all; part of that justification is that all equipment be utilised 80%+ for it’s lifespan. Buying equipment just to handle the changeover then letting it sit idle would /not/ have gone over well. Most especially since we would have required somewhere around 30% of our yearly budget’s worth of gear to do it.
Sometimes, you just gotta do what you gotta do…
Rental server equipment?
I wonder if using rental server equipment would have helped for phasing things in?
It could mean for double installing (ugh), but that might be worth it for being then able to phase things in over (say) 2-3 months at leisure, then give the equipment back afterwards. :)
Two problems with that idea.
1) I actually have no idea where to rent server equipment from. Honestly not a clue where to even begin.
2) I didn't think of it until we were about halfway through the weekend. Hindsight and a great deal of “d’oh” suggests you might well be on the proper track, however…
I have sympathy for you. But only some. 2 x sysadmins- that lot, and only a weekend. And it all had to be ready on Monday.
However sactioned that plan/idea - or whoever demanded it - is nuts, or very very stupid. I'm not at all surprised at 82 hours without sleep, and I bet the issues did not end there with plenty left overhanging.
I believe the appropriate phrase coming from military science is
"No battle plan survives contact with the enemy" (von Moltke)
Though I quite like the idea of adapting this one :
"Nothing makes a man more aware of his capabilities and of his limitations than those moments when he must push aside all the familiar defenses of ego and vanity, and accept reality by staring, with the fear that is normal to a man in combat, into the face of Death."
Major Robert S. Johnson, USAAF
Perhaps the best thought to take is to plan like Monty - the one time he failed to plan correctly he mucked it up.
Damn, I've been living the easy life
The mantra for us is incremental change - coexistence and migration. We avoid big bangs of any sort like the plague because of the inherent risk. That's not to say that we never do anything like this, but I'll put it this way: the activities you mentioned would normally be rolled out through seperate, smaller activities over a longer period of time.
Understand, I'm not trying to slam anyonee here - when it comes to leadership (be it the CIO, CTO, or the real business side of the house) there is a seduction to the "do it all in one weekend" approach, and sometimes there is just no way to convince them otherwise (lucky for me, I'm a 3rd party so I usually get my 2 cents into the plan - I understand that is not typical). It is a political discussion more than a technical one - you might even describe it as sales. I'd say that when I was an SA probably 20% of my time was spent trying to shepherd my clients away from, to me very obvious, strategic errors.
What it really boils down to in this case is management of risk - no matter how much testing and how great the planning is, there is a reasonable limit for what can be accomplished and should be attempted in a given span of time.
Unfortunately, when moving from one forest to another, there isn't much in the way of incremental change that is possible. When you look at all the new systems; totally new forest, new e-mail server, new OCS, new WSUS…the only way to have made that change incrementally would have been to have had the hardware to completely run BOTH networks in parallel.
Sadly, there was no way we had enough gear to accomplish that. I would love to have made an incremental switch. Sadly, I could see no way to do it…
... just wow.
That's 3 months work, not a weekend. Whoever thought that those timescales were realistic should be shot. (I know it's not you Trevor.)
Surely you do a hardware refresh every few years? Could you not pay a little to extend the warranty from 3 years to 5 on your virtual kit, and in the last two years build your new network, test, then just throw over the databases (Exchange, SQL etc.) and move the user accounts (not profiles) using ADMT and run a two-forest setup for a month or two then finally decommision the old hardware?
Gives lots of head room, and also doubles up as a "disaster recovery" hardware story too for the boss with pretty low spend.
Anyway - hats off. I'd simply refuse under the guise of "it's just not possible". :-)
@The original steve
Actually, I *AM* the one who said it had to be done in a weekend. Here's the scoop:
We absolutely /had/ to have the updates done by September 1st. (This per CTO requiring updates and the fact that Sept – Dec is silly season around here.) As per above, I could see no way except to do it “all in one go.” I was not present through July, as I was in two of the other locations installing Wyse clients, recabling and prepping the locations local hardware for the changeover.
It took us the first two weeks of August to get the Domain Controllers, E-mail server, and BES/OCS/WSUS/Teamviewer manager server (yes those 4 share a VM) installed and for the new network. (I was going to have THOSE at least installed, if not configured by Doomsday.) In addition, we roughed out a template Windows 7 VM and a few template Windows XP VMs for render boxes.
We didn’t have to go ENTIRELY from scratch, but I promise you that even to get what we had prepped in advance we were scraping the bottom of the excess Virtual Server capacity barrel.
I managed to create all the user accounts prior to hitting the wall, but that’s about it. Exchange wasn’t configured, OCS was not only not properly configured…it had to be uninstalled THRICE and reinstalled specifically to get the blessed thing to cooperate. As to the rest…well…there’s more articles on that.
Suffice it to say that the actual call “hey guys, we need to do this all in one go over a weekend” was mine…however there were ZERO other choices that would have had us meet the deadlines imposed on us…
An additional comment...
I should also point out that when server upgrade time rolls around, there won’t be any side-by-side coexistence of servers. Go read everything in the article that we support, and then understand that I have to shove all of that, 4x 100Mbit fiber connections and the salaries of the two sysadmins and the bench tech into less than a quarter-mil a year. When the virtual servers in use are done with being servers, they will be removed from their chassis, given a hearty dose of TLC and placed into desktop chassis.
They will then serve an additional three years as “medium-demand physical workstations” for our Photoshop geeks. The next server replacement cycle doesn’t occur until 2012. We are using “tick-tock.” This year was desktops. Two years from now is Servers. Two years after that will be desktops again. The sole exceptions are the Photoshop geeks who get trickle-down servers as part of the server refresh cycle.
When you see me talking in my articles fairly constantly about the need to do things cheaply, (or constantly trying to find the most cost-efficient method of doing something,) this is why. It should also then carry some weight when I recommend spending money on something. The cost of a Windows Server Enterprise license can to me mean 4 years of service life from a virtual server capable of running 30 personal virtual machines.
I constantly work on the razor’s edge of what is actually possible with the hardware and software I can get my hands on. I can’t even big myself up and say that this is because I am somehow superhuman or a great sysadmin. I am just willing to work long hours to get things done. I will tell you now, honest and true, that without my partner-in-crime fellow sysadmin (and one of my very best friends) this wouldn’t be possible.
I specialize in the impossible; making a system do what it was never designed to. Pushing the limits and doing the research. I am an IT MacGuyver, but as such I am only one part of the equation. My buddy is the polar opposite. He is the living embodiment of “by the book.” He keeps me in check, goes over all the nightmare hacks and kludges I have created to put out fires and get things works. He takes my quick fix or cobbled together solution and produces reams of documentation, tests it in alternate conditions and works out something that is reproducible and far more production ready.
Together, I honestly think we make a great team for an SME environment. We do what I am constantly told by me peers simply isn’t possible. I have an enormous amount of pride in that, but I do have to say the stress gets to you. It isn’t the stress of the long hours…but the thanklessness of the job. Heroics aren’t rewarded in IT. Nobody cares that you are pulling all nighters or that you are doing the impossible with no budget. What they care about…the ONLY thing they care about…is what doesn’t work, or isn’t set up the way they want it.
IT Operations isn’t a field where people pat you on the back and say “attaboy.” It’s a field where you can damn near kill yourself for seven years and then get reamed out because you collapsed in exhaustion before you remembered to tell someone some minor detail about something. It is a field where hard work and technical achievement pale in importance to ego stroking and pandering to the whims of users and managers.
Operations guys like me are viewed as little more than digital plumbers. When the cutbacks come at the large corporations, Operations are the first to feel it. Those who remain are told to do more with less. In the SME space, we are constantly up against the wall on budget, manpower and time. Through it all there is always the threat of having your job outsourced to a consultancy or another country.
So why did I put in 82 hours straight? Because the network needed to be ready for September 1st and I could see no other way. One day, my editor might even be able to teach me enough that I might be able to make writing into a career that keeps food on the table. That would be a great day. Until then, I do what I must because there is no other choice.
I am not afraid of hard work. I am afraid of letting people down; especially if the people in question employ me. If keeping my job requires stupid hours, then that’s what I’ll do. There aren’t a lot of IT good jobs for IT Operations folks in Alberta. There are however plenty of significantly worse ones than the gig I’ve got….
That's impressive. I also would have simply said "it's impossibile in this timeframe". (but I'm a Linux sysadmin so I don't know Windows deeply enough to evaluate such a plan properly)
I'm a longtime reader and enjoyed the bit. Looking forward to the remainder and hoping it will expand on the successes and non- that occurred, making it suitable case study material. Cheers.
If you are looking for case study material you might find the reasoning behind the migration interesting.
The biggest reason for the move was the damage to the schema. In truth, some of it was caused by inexperience several years ago. Installing one product or uninstalling another caused all sorts of crap to accumulate in the scheme that I couldn’t clean out by hand no matter how hard I tried. (I simply didn’t know exactly where all the bits hung out, and Google wasn’t helping me much.)
Similarly, before we moved to virtual machines, it wasn’t uncommon for domain controllers to just up and die. As small as the company is (read the article again to get an exact count of how many systems we have) seven years ago we had one domain controller that was also a file server, firewall, FTP server and everything else all in one. When I started with the company that domain controller was being run off of a Pentium 4 desktop board with a single desktop-class hard drive.
As you can imagine it was a few years before we started to get DCs that didn’t experience random and sudden hardware failures; so there were remnants of these old DCs, despite my best efforts to nuke all references to them out of the AD. (The “Proper procedures” don’t get them all. Especially if your previously an heroed DC was a certificate authority, etc.) Not to mention that going from OCS 2003 -> 2005 -> 2005 R2-> 2007 R2 creates a whole bunch of zomfgwtf hanging about in the AD. Similarly Exchange 2000 -> 2003 -> 2007 -> 2010
In theory if I had thrown enough time at the old AD I might have been able to clean it. There comes a point however where you have to look at the whole mess and say “I only have 75 users. Let’s just restart from scratch; it’s significantly less effort.” I always have this sneaking suspicion that if I was a true active directory expert, I might have been able to avoid all of this. I’m not though. I’m not an “expert” at any one field of IT. I haven’t spend my career specialising. As an SME sysadmin I have to maintain all of those systems you read about (and more that didn’t make it into the article.)
There simply isn’t any possible way for one human being to develop true expertise in all of those various systems. (There is no such thing as a modern-day polymath.) The best I can reasonably hope for is to understand the fundamentals and as many of the commands/quirks/specialised ballyhoo of every application, operating system, hypervisor, file system, database, piece of hardware, networking, crypto etc as I can fit into my brain.
Where that falls apart is the truly in depth knowledge of things like Active Directory. I know more than your average bear; but by the same token I only really have to deal with it once every two or three years. It might have some real arguments for/against IT generalists like me. I believe I am a rare breed in IT; most folks seem to have specialised in some particular part of it (databases, AD, LAMP, whathaveyou) by this point in their career.
I am capable of dealing with a wider variety of systems than your average specialised IT body…but in order to have that capability I have to sacrifice a great deal of the super-specialised knowledge that comes from spending over a decade dedicated to a single type of product. At first glance, someone like myself might seem ideal for an SME. It certainly gives me a breadth of knowledge and experience that allows me to write about various topics here on El Reg.
What I begin to wonder however is if the truly efficient way to deal with SME IT administration isn’t to have SMEs handled by largish consultancies. A large consultancy can afford to have one (or more than one) of each relevant kind of specialist. When they run up against a challenge like I described above they don’t have to restart the whole AD from scratch. They summon their in-house AD super-specialist and he deals with it.
Where’s the line between the utility of an IT generalist and the advantages of a small cluster of IT specialists? I’ll be honest when I say that I don’t know. I am however exceptionally curious about the answer…
The new, thin line...
You've just hit the new, thin line of IT and where IT generalists have to redefine (as do CIO's) what they do today. Basically your new role is to identify the work that has to be done and the outsource firms capable of doing the work, including TCO and ROI. Similarly, your role is to identify those cloud providers that can provide the services that the enterprise firm requires for current, and future, requirements. True this is more a business function but if you do not have business knowledge, your future is bleak. As you found out, the number of individuals that are equally adept in all aspects of IT today are vanishingly small and you wouldn't be able to afford it. And I do know a few.
I'm rather surprised you didn't approach IBM Services to pull this off. More than anyone else in my experience, and I've been 'playing' and working in the SME and LE spaces for decades now, this is their bread and butter. Making disparate systems just work including all conversions necessary. It was also mine at one time working for one of the largest IT consuming 'firms' on earth. I almost wish I could have been there but I've always had a very warped idea of fun. [My current record for up-time is 128 hours and, yes, I was hallucinating at the time so I had to have a knowledgeable second checker on oversight.]
I did it because it had to be done. I doubt I'll eve be trying something like it again. We didn't have the money for IBM Globalservices. We did have the money for a sysadmin who doesn't get paid overtime. What I did was make a decision that was good for the company, but bad for me personally.
If I have a flaw as a sysadmin it is honestly that I work too hard. When I put my “company hat” on, I push out my own needs and focus entirely on what is best for the company. The problem with that is that I then rapidly burn out, which doesn’t do the company any good and sure as heck doesn’t do me any good.
So the thing is to know when to spend the money and when to bur myself out. Sometimes the money isn’t there and so the call is taken out of my hands. Sometimes I make a bad call, sometimes I make a good one. I’m still learning where the balance lies.
I agree with your assessment though. The IT generalist is becoming an individual who manages a series of contractors and outsourcers. It’s something that saddens me, because it means the end of my career in IT. I don’t have a degree, nor any management credentials beyond running an SME IT department for seven years. I might be able to get a PMP designation or somesuch, but then I am fighting eleventeen squillion unemployed IT Operations guys with PMP designations for the small handful of jobs left on this continent.
Once the SME administration job market dries up, I honestly don’t have a clue how to make the jump to being an admin in a larger enterprise, nor do I think that I have the resume to go up against the many other vastly more experienced contenders for the “outsource manager” positions.
This is why I’ve taken up writing. I think that ten years hence there will be more of a career in writing than in SME IT administration. Have to change with the times. If I can stick with IT writing, then all those years of experience as an IT generalist won’t have gone to waste.
There's still a role for the next decade...
I think there's still a role for the next decade for someone that has virtualization in their tool belt. You'll need to refocus on the penetration of virtualization, both server and desktop, as it moves into the SMB market and you'd probably need to do it on a consulting basis. Similarly, I'm sure there will still be some wacked out managers out there trying to do the same thing as your firm, piling up way too much on too few 'masochistic' sys-admins.
Actually we aren't far apart experience-wise and like you I would pile up way too much on my plate, namely colateral duties (sys-admin _and_ hardware support as well as bench tech) out the wazoo (to be polite). Frequently someone would come up with a request (where did all my consumable money go on the first day of the quarter?!!!) and I'd have to gin up a program to figure it out with no documentation on the database structure let alone code to work with. I'm sure you understand that type of request. So, yes, I'm masochistic as well, but it was so much damn fun when I pull off the 'impossible'. Anyone can administer the systems I was minding (I kept waiting for the banana to fall out of the front of the box). Handling the weird was what got me recognized as 'the expert' for half the US Navy.
Give serious thought to looking at the SMB market. Microsoft is already recognizing that it is seriously under-served and has no on-site IT experts (Windows SBS code-name: 'Aurora'). IBM has already been targeting appliances into that market. VDI is pretty much sure to follow and that is not something that is dead simple to implement. I've been playing/using virtualization on the x86 platform from almost the very beginning (somewhere around here I have the real early betas of VMWare Workstation) and much longer than that on other machines. Personally, I believe VMWare is pricing themselves out of this market, focusing almost entirely on datacenters) so there is an opportunity here. I haven't seen anyone else try to drive costs, especially TCO, down to manageable levels for the SMB market.
Anyway, good luck trying to find a nice landing zone. I'm out of that, thankfully. Intentionally.
The Original Steve mentioned ADMT and I would like to follow up on it. Did you consider using ADMT and discard it? If so, what was your reasoning?
I am interested as we, too, have an inter forest migration planned but are taking the incremental approach over the big bang. I would be interested in the thinking behind discarding ADMT.
A large part of what we needed to do was shed the "cruft" a decade of disparate naming schemes, hirings, firings, e-mail address changes, remaining of users etc had caused to the AD.
You had USER_D who had been created as USER_A originally. The name was changed in order to ensure that USER_D could access USER_A's e-mail for business continuity purposes, but many of the various hooks for that user in the AD still reference USER_A. Similarly, many of the users were simply Firstname instead of Firstname.Lastinitial (which we started to make all new users several years ago.) There are other examples, but you get the idea. For this reason ADMT was pointless. We didn’t want to migrate the users. We wanted entirely new users with clean information that would follow the naming convention from the CEO to the digital janitors.
Recreating 75 users was the absolute least of our worries. That was about an hour’s work. Create the users, assign them a SIP address, make them an e-mail address. Link the home folders. Set the dial-in permissions on the small handful who needed them. Everything else was handled through GPO.