back to article Sysadmins: Poor capacity planning is not our fault

Our latest reader survey was a little different to usual. Normally we research new stuff like the latest hot technologies and ideas. On this occasion, though, we looked at a discipline that's been around for decades – capacity planning. The aim was actually to investigate how well existing processes, techniques and tools in …

  1. Pascal Monett Silver badge
    Flame

    "get senior management to take the issues seriously"

    That only happens when senior manglement is capable of looking beyond its own nose.

    If you're stuck with the incompetent, IT-is-wizardry and just-make-it-happen types, it's not ammunition you need, it's a frakkin cluebat because with those types, there is no such thing as proof that they haven't done their job - it's always your fault.

    And the cherry on the fail cake is that they have no compunction of looking at you at saying that you should have told them. After a number of emails and meetings which did exactly that.

    At that point, it's ammo I need all right. The 9mm kind.

    1. Doctor Syntax Silver badge

      Re: "get senior management to take the issues seriously"

      "That only happens when senior manglement is capable of looking beyond its own nose."

      Been there too. Repeated requests for more memory turned down until an OS upgrade put the system into severe thrashing at which point the vendor was summoned to add more memory PDQ and promises made to be more responsive to requests in future. Yeah, right.

      1. Emperor Zarg
        Trollface

        Re: "get senior management to take the issues seriously"

        Repeated requests for more memory turned down until an OS upgrade put the system into severe thrashing at which point the vendor was summoned to add more memory PDQ and promises made to be more responsive to requests in future.

        Sounds like your Change Management process was borked as well.

        1. ecofeco Silver badge

          Re: "get senior management to take the issues seriously"

          Change Management process? I've heard of this. Isn't it a four legged animal with a single horn on its head?

          I'd love to see one in the flesh.

          (I take that back. I've seen one. One in 20 years)

        2. Doctor Syntax Silver badge

          Re: "get senior management to take the issues seriously"

          "your Change Management process"

          Changing managers could have helped but didn't.

          Oh, I see what you mean. But you've build an assumption into that.

    2. Anonymous Coward
      Anonymous Coward

      Re: "get senior management to take the issues seriously"

      I prefer the .357 Magnum size. Some of their skulls are rather dense.

      The 44 Mag. just makes too big of a mess, and a .45 just barely gets their attention.

      1. kain preacher

        Re: "get senior management to take the issues seriously"

        Might I suggest a .41 mag pr 10mm ?

    3. MotionCompensation

      Re: "get senior management to take the issues seriously"

      As an IT guy who's also in senior management (founder), I'd like to add that plenty of IT people expect the business to understand their side, while doing very little to explain the problem in business terms, or trying to understand what the business as a whole is facing and where they fit in. That's not everybody here of course, but it does happen.

      I think we should not expect business people to understand the IT stuff. We need to get rid of that somewhat arrogant, smug tone of voice and begin to play as part of the big team, not just the IT team. Of course, it helps if the business people let you play as part of the team (if not, find a different team?).

      Bring on the down votes.

      1. Brian Miller

        Re: "get senior management to take the issues seriously"

        As an IT guy who's also in senior management (founder), I'd like to add that plenty of IT people expect the business to understand their side, while doing very little to explain the problem in business terms, or trying to understand what the business as a whole is facing and where they fit in.

        Here's a problem: the business people don't understand the system, and neither do the IT people! Imagine a system that's running on an old release of Linux, and there's no real plan to replace it. Imagine that I had to explain to a developer, who'd been with the company 18 years, the difference between a C preprocessor macro and a function. Imagine that none of the developers know how to use a debugger. Imagine that two years ago, one of the developers hit the system with a load test equivalent of a large fuzzy Q-tip. The system fell over immediately, but nothing was done about it.

        And of course, imagine that the business people are blithely selling the product like it can be a fabulous solution to handle bazillions of users. Which it can't. And that's been explained to them. Repeatedly.

        At some point, "engineering" really means, "the work of designing and creating large structures (such as roads and bridges) or new products or systems by using scientific methods."

        Scientific methods, what a concept. Designing. Planning. Estimating. Testing. There is nothing that the minion on the bottom can do to get managers off their ass and do something competent, when the managers are fully incompetent, and of course never had the training in the first place.

        When does senior management take the issues seriously? Bankruptcy. "You have no money. Now go home, pack your stuff, and live in a cardboard box on the sidewalk." Then they'll take notice.

      2. Disk0

        Re: "get senior management to take the issues seriously"

        When running a business, I think it is a good idea to understand the moving parts - or at least have someone around who does. It is rather typical of the suit contingent to blame people with no real decision making power for their failure to run the business or understand the processes. If they did, they would recognise the necessity of capacity planning, just as they recognize the need for a big enough warehouse for storing product.

      3. Naselus

        Re: "get senior management to take the issues seriously"

        "I think we should not expect business people to understand the IT stuff."

        I'm all for the old mantra of 'we must meet business half way' and suchlike, but tbh I've seen a hell of a lot more effort from IT to understand business than I've ever seen from business to understand IT. We have things like Agile and ITIL, which love or loathe are very much efforts on IT's part to get to grips with what business wants from us. However, in most departments I've worked in, the problem hasn't been IT expecting business to understand the IT stuff, but rather business not understanding that business doesn't understand the IT stuff.

        In many businesses, there is a general belief that IT is staffed by wizards who can make literally anything happen with no new equipment or budget just by 'typing some code' (never mind that I'm a systems engineer, and 'writing code' barely features in my professional life). Complete refusal to learn anything more technically involved than email or how to download Fruit Ninja onto an iPhone means that most of management is completely unequipped to understand whether what they want is possible, feasible, or sensible, and most IT procedure is ignored or belittled as meaningless bureaucracy.

        Simultaneously, they disregard literally any expert advice offered by IT staff on whether their new idea is actually feasible. The attitude is very much one of 'but I want it!', and because I'm apparently Gandalf the Grey I can just make that happen in two minutes or so through the magical coding skills that are in no way connected to my job. And this is where IT's much-discussed 'arrogant, smug tone of voice' comes from; we are so used to being interactions users who act like spoiled children that condescension is the natural response.

        I work for architects, and I generally understand what it is they do. I don't understand the minor details, but in general they draw building plans using CAD software, generate manifests for the materials that will be used, and create and offer presentations in order to show the client what they are trying to do. I can, on any given day, give a reasonable estimate of what an architect will be spending their time doing. However, if you ask most of the other staff what it is *I* do, I'd doubt more than 2-3 would be able to give you even a vague idea. They may describe me as a hacker, or a coder, or a 'computer genius', or as 'a guy who fixes computers'. They have simply no idea of the reality of my job or what it is I'm employed to do, and have little interest in figuring it out.

        This is the attitude of a child to a parent's job - they understand that I spend a lot of my time doing something called 'work', but they don't have even an rough idea of what it involves. This alone shows that business has not made any serious effort to "meet us halfway" and achieve any sort of synthesis with IT. The entire cultural setup is that if anything has a plug on it, then there's no point even trying to get a vague understanding of it and it should be surrendered entirely to the wizard, who will chant the incantations and appease the Server Gods. There IS some sign of improvement in the younger generation - even the non-technical users under the age of 30 take an active interest in what I do and how things work - but the older generation are pretty much a lost cause and there's no real chance of any improvement until they retire.

    4. Anonymous Coward
      Anonymous Coward

      Re: "get senior management to take the issues seriously"

      After getting screwed over by manglement once too many times I started keeping physical printed copies of all my communications to them about this very subject.

      Me: We have #X capacity. We are adding #X+N additional capacity per day. We need #Y additional capacity in the next thirty (30) days before we will not be able to continue storing our files. No amount of defragging, shifting, sorting, or rearranging the deck chairs will change this fact. I suggest we plan ahead & get #Z additional capacity instead to push back the NEXT time this situation will happen to as far in the future as possible.

      What manglement decides: Just keep using what we have. We'll think of something.

      Me, face palming as I know full well that they will do no such thing and then blame ME for the clusterfuck that's going to hit the fan. Sure enough twenty-eight days later & they tell me they have a plan to discuss the matter at an upper level manglement meeting next week & I'll just have to wait for the outcome.

      Me: That's fine. You do that. In the mean time I'd just like to remind you (for the umpteenth trillion time this week alone) that in two (2) days, forty-eight hours from now, we will cease having sufficient room to store our files. No More Files Will Be Saved. None. That includes payroll, production, and any ability you may have had to electronicly file a purchase order to our vendor via email since the email client *and everything else* will be OFFLINE. We. Don't. Have. Capacity. I've been telling you this for a month now. You've put it off until the last minute. I've kept records to prove that *I* have been practicing due dilligence & that YOU have not. So when someone's ass gets about to be thrown under a bus? It won't be mine getting the tire treads. And I'll bet the company employees just LOVE the paid vacations we get for sitting at our desks doing nothing until the new capacity arrives & can be properly installed, because we can't store any more files, no phone logs, no trouble tickets, no customer orders, no vendor supplies that need digital confirmation, *NOTHING* since... say it with me now... We Can't Store Anymore Files.

      I hated that boss.

      I left the company & never looked back.

      I laughed my ass off when the guy got his ass fired for mismanagement & incompetence... something about not provisioning enough server stores to handle the daily amounts much less a year's worth of data.

      *Rude Gesture to Mister K.*

      Fuck you. I hope your next job involved a toilet plunger & a tip jar.

  2. Doctor Syntax Silver badge

    "Of course some argue that the informal, ad hoc approach is perfectly adequate, and that getting too ‘procedural' is more trouble than it's worth. Fair enough if you have a modest and relatively slow-moving IT environment, and a small IT team in which everyone always knows what everyone else is up to."

    I once had a gig providing 2 weeks holiday cover in a very small DBA team with a week either side handing over. Most of the 2 weeks was spent on the paperwork to add another 2Gb chunk or two to the database. As far as I know the entire machine was devoted to running the database and the applications running on it. I don't suppose the operations team which had to add the space from the LVMS was any bigger than the DBA team but I never set eyes on them. I don't think that buseiness exists any longer.

  3. Steve Davies 3 Silver badge

    But....

    isn't this all supposed to go away with the Elastic Cloud that we are supposed to be using now?

    Need more space/ram/cpu? Ok here it is. Carry on, business as usual?

    There will be a lot of PHB's hoping that what I have said above it true. If it isn't and they get found out then they'll be looking for another company to wreck.

    Oh and to Java devs everywhere, writing everything including the kitchen sink to Log4J output files in not the answer to reliable systems. I've seen several grind to a halt because of the vast size of Log4J files when (cough-cough) enabled on production 'just in case' things go wrong.

    1. Anonymous Coward
      Anonymous Coward

      Re: But....

      "There will be a lot of PHB's hoping that what I have said above it true. If it isn't and they get found out then they'll be looking for another company to wreck."

      Unless, of course, said PHB knows or is related to someone on the board...

    2. paulm

      Re: But....

      Java dev here. That's a config problem. When configured correctly, log4j will roll over log files based on time or size, and I'm pretty sure it supports removing/archiving old log files too.

      If the system grinds to a halt because it ran out of disk space, then someone wasn't monitoring the production system's disk usage properly. Something like Nagios to catch this in advance isn't that hard.

      Yes, log files can be large, and some devs do put too much into them, but they're preferable to having that one case when something goes horribly wrong and you not being able to explain what happened because you don't have them. Enabling them after a problem and hoping it happens again to catch it this time isn't practical.

      1. Steve Davies 3 Silver badge

        Re: But....

        You make some good points but we got bitten a couple of years ago. The Dev spec'd the roll over to be 21 days .Blew the filesystem space away after 7.

        So we will not deploy any code with Log4J[1] in it into production.

        Better safe than sorry.

        [1] or any logging to files whatsoever. Logging to a DB table is allowed because it is managed.

        1. Stretch

          Re: But....

          Totally with you on the log files.

          Once was asked to look at a slow program on an AS/400. Dug into it a little to see it was logging "The message is " + xml every other line. Set log level to ERROR and box runs 3x quicker. Hailed as a miracle worker.

      2. ro55mo

        Re: But....

        I have seen log4j fill a 100GB OS drive in less than a day. Now configured to log to a separate drive.

      3. fajensen
        Trollface

        Re: But....

        Java dev here. That's a config problem. When configured correctly, l

        To become a *true* Java developer you must always misconfigure log4j and log to MySQL and also to a place where no one will look and especially never to anything designed for the purpose like syslog - To a Real Java developer all logs that can be produces are precious, unique and private little gems.

        It's *important* to see that 5 years back someone clicked something on a web page and it caused some transactions (if we had kept the source, we could even say what those were, but, lolz, we didn't).

        If the system grinds to a halt because it ran out of disk space, then someone wasn't monitoring the production system's disk usage properly.

        THAT's the True Java Spirit, keep it up and we'll be paying the mortgage off in a few years with the on-call money.

    3. Adam 1

      Re: But....

      > Oh and to Java devs everywhere, writing everything including the kitchen sink to Log4J output files in not the answer to reliable systems

      Log4xyz is a good thing™. Certainly beats the hell out of something went wrong somewhere and we have no logs or some half arsed attempt to write to a text file using code lifted from stack overflow which isn't threadsafe, isn't buffered and works by loading the whole file into memory, appending a line then rewriting the file. Oh and by a file, I mean hundreds of files in various folders with no cleanup mechanism.

      Other than sensible defaults, it's usually not a developer's role to configure log4xyz (internal or custom software where you have full understanding of the deployment environment may be the obvious exception). That is why you can change the verbosity of the messages in a config file. It is why you can choose your own appender. If you use a rolling file appender then you can specify things like maximum size, number of files to keep and so on. Then it is just a discussion with business about how much storage they want to pay for vs the point where files get deleted. That's their decision, not yours, not devs. Your job is to make sure you explain the consequences of whatever set of numbers get thrown at you.

      The other side of the coin is ensuring that the I/O can handle the volume you throw at it. If you have your loglevel set to debug on a multi threaded stack, it may not be adequate to dump log files to some slow HDD.

      Wait, you made me defend Java you sneaky bastard. Is that the new Rick roll?

  4. FredDaggg

    Just a test server.

    Yeah, got bitten by that one. More than once. Less than three.

    Now nothing gets build unless it's agreed in writing "IT'S A TEST". With (1) an account code for charging, and (2) a good old fashioned pen-on-paper signature that the machine can and will be removed from the SAN in 90 days. Or, there is an actual IT Steering committee decisions that it's now in prod with the resources it needs.

    The business owners were smarter than me. They thought they could game the system ... once. Never happened again.

  5. ecofeco Silver badge

    So one word then?

    Manglement. (I almost said IBM)

    I see this everywhere. Manglement without any real experience in IT trying to stuff 10 pounds of shit into a 5 pound bag and buy champagne on a beer budget.

    There's an old saying: it's good to save money in business but you can save yourself right out of business.

    That should be the first thing they teach in business schools.

    1. Anonymous Coward
      Anonymous Coward

      Re: So one word then?

      "There's an old saying: it's good to save money in business but you can save yourself right out of business.

      That should be the first thing they teach in business schools."

      Problem is that the first thing business school graduates learn upon entering the real world is that if you don't strip to the bone, competition will undercut you right out of business. Worst thing is, due to lag, you can't tell which is which until it's already too late to do anything about it.

  6. Anonymous Coward
    Anonymous Coward

    Go Fund Yourself

    Literally been told to find my "cancelled" budget in other parts of the corporation.

    So, running around, informing everyone, we need to replace a system to to capacity limits for a whole year and allocating a budget for this didn't survive a senior management meeting.

    Who needs server production capacity anyway. We will just need to switch back to paper now.

  7. bdeluca

    really the answer to the log files files up the space is to disable logging?

    Have you ever thought about working with the devs to scope it better, or you just bann things that might cause an inconvenience. We log to the database, which consumes 10x times the resources, I guess you want to pass the buck to the dbas.

    if you are acting like this you are the problem, you would be fast tracked to performance management if you worked for me.

    Take ownership of the problem, work with others find a solution.

    1. Anonymous Coward
      Anonymous Coward

      "really the answer to the log files files up the space is to disable logging?"

      Welcome to the real world, the Dilbert world, where the boss isn't your friend.

      "Have you ever thought about working with the devs to scope it better, or you just bann things that might cause an inconvenience. We log to the database, which consumes 10x times the resources, I guess you want to pass the buck to the dbas."

      The devs aren't cooperating. The ONLY weapon at hand, therefore, is the Banhammer.

      "if you are acting like this you are the problem, you would be fast tracked to performance management if you worked for me."

      And then I would tell Performance Management that if I'm not given the allocated resources, they're going to have a VERY hard time keeping performance reports on file. IOW, if I can't do MY job, THEY can't do THEIR job. And if PM doesn't see it that way, I would have no choice but to take it straight to the board because my problem is a short distance from becoming a failure of due diligence, which means the LAW gets involved...

      "Take ownership of the problem, work with others find a solution."

      How do you take ownership of the problem when no one lets you solve it? How can you tighten things down when the only tool they give you is a saw? How can you work with others when their view of you is more consistent with a magical horned equine? IOW, sometimes you just can't get there from here.

  8. hoola Silver badge

    Visibility and Accountability

    There are several issues here. The first is that within IT at the technical level we all want the best, regardless of budget. I never see any cost benefit done after the implementation to see that the project delivered what it said it would. Usually the projected cost savings never appear because they were total spin to begin with. Then there are the issues where a techies pet project gets pushed as the solution to all problems, again with no realistic chance of it actually delivering. IT managers should not have to understand every technical nuance but equally the techies have to be realistic and honest with what they are trying to do.

    Developers also have unrealistic expectations with huge requirements for infrastructures that have multiple copies of different systems. A development cycle clearly needs a platform where it can be built and tested but as each new methodology comes along, so do the overheads. At a SysAdmin level these are just expected to be absorbed into the general running of the IT systems with little thought for capacity or the cost to the business. The development methodology may save money at one level but cost more elsewhere.

    Again these all cost significant amounts of money.

    IT only ever asks for money, it never creates any and as a result is a cost to the business, directly hitting the bottom line. Sure, IT is an enabler for the business but all too often that is completely overlooked, even at IT director level.

    1. Anonymous Coward
      Anonymous Coward

      Re: Visibility and Accountability

      "Again these all cost significant amounts of money.

      IT only ever asks for money, it never creates any and as a result is a cost to the business, directly hitting the bottom line. Sure, IT is an enabler for the business but all too often that is completely overlooked, even at IT director level."

      The thing is, IT is stuck in that thankless position of "pay a little now or pay a lot later...if you survive." Adding to the bottom line is nice, but keeping said bottom line from shrinking is rather important, too. Controlling costs is just as important as raising revenues which is why BOTH are listed in a NET (vs. GROSS) report.

    2. Doctor Syntax Silver badge

      Re: Visibility and Accountability

      "IT only ever asks for money, it never creates any and as a result is a cost to the business, directly hitting the bottom line."

      There's a good way to test this. Turn it all off. For an hour. A day. A week. See what happens to the business.

      Would you have the balls to prove "it's only a cost" by doing that?

  9. Pete 2 Silver badge

    Is this even a "thing" now?

    > many don't even have what we might arguably describe as ‘the basics' properly covered.

    In the "olden days" (speaking as someone who has, read and understood Raj Jain's book) this was almost always about disk I-O. Since everyone now has everything important on an M2 array or better, there is little point in paying people to predict problems that are now only ever due to network misconfiguration.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like