back to article How to design a storage array: NOT LIKE THAT, buddy

We’ve had a few announcements from vendors recently and I've seen all manner of storage roadmaps. If I had one comment to make on all of these, it would be to say that if I were to design an array or a storage product, I probably wouldn’t start from where most of these guys are starting. There appears to be a real fixation on …

COMMENTS

This topic is closed for new posts.
  1. Alex Rose

    Someone got paid for this "article"?

    "Don't keep doing it the old way just because of inertia, do it a new and better way but make it look the same as the old way to the user."

    There you go, same article in only 31 words. Where do I send the invoice?

    1. Anonymous Coward
      Anonymous Coward

      Re: Someone got paid for this "article"?

      I hate to pigpile on the negative review side of this, but that wasn't an article of El Reg quality.

      Is StorageBod "Chris Mellor while drunk and fed up"?

    2. Seanie Ryan

      Re: Someone got paid for this "article"?

      yeah, i clicked on the title hoping for some insight into storage arrays. This article was complete trash.

      Alex's 31 word sum-up was actually more gripping than the article.

      Definitely a last minute filler

  2. jake Silver badge

    Whatever.

    Pick an operating system. I prefer BSD for this kind of thing.

    Chose the necessary software (homegrown is usually best).

    Find the hardware that the above runs on.

    Install, proof, and roll with it. It ain't exactly rocket science.

    1. JaimieV

      Re: Whatever.

      Definitely. If it's good enough for a home network NAS, it's good enough for multinational industries.

      1. Roo

        Re: Whatever.

        " If it's good enough for a home network NAS, it's good enough for multinational industries."

        In my experience any software tagged with the buzzword 'enterprise' requires specialist support minions, has inadequate/out of date/error riddled documentation, has an uptime measured in hours and can't be replaced because:

        1) The guy who authorized the purchase for $shedloads would be made to look bad.

        2) The hordes of support minions don't want to be punted out onto the street (so they will aggressively resist anything that threatens their job/income).

        3) The incumbent vendor will threaten all kinds of nonsense when it comes to renewing the support contract.

        4) Last but almost always least : the migration will require diligence and attention to detail.

        Luckily for people who just want to get work done without playing nursemaid to rubbish overpriced under-documented software, BSDs don't qualify as Enterprise software on any of those grounds. ;)

        1. Juillen 1

          Re: Whatever.

          Weird.. In running an enterprise setup, I tend to find that things tagged as being Enterprise level require support minions, have pretty detailed documentation that you get the vendors to dredge up for you on request (no futzing around on Google to see what someone managed to post somewhere 3 years ago), and has an uptime in years (24x7 systems). It's also replaceable (in the main) because it's architected to be replaceable.

          Rarely had issues with support contracts, and the vendors don't play silly buggers. They play to the document. If they say they'll deliver, they do, otherwise they get hauled up.

          Any migration to another system needs diligence and attention to detail. If you don't apply that, you're going to fail. Badly.

          1. This post has been deleted by its author

          2. Roo

            Re: Whatever. (@ Juillen 1)

            "Rarely had issues with support contracts, and the vendors don't play silly buggers. They play to the document. If they say they'll deliver, they do, otherwise they get hauled up."

            Lucky you. Curious about how you would "haul up" a vendor. What kind of sanction could you apply do you think ? In many cases I suspect your business would be bankrupt from data-loss before you could actually see the vendor in court.

            "Any migration to another system needs diligence and attention to detail. If you don't apply that, you're going to fail. Badly."

            Absolutely right - I meant to remove that point, the irony is delicious. :)

            1. Anonymous Coward
              Anonymous Coward

              Re: Whatever. (@ Juillen 1)

              To "haul up" a vendor, you ring up your account manager at the vendor bollock him for whatever has happened and they go and deliver appropriate bollockings all over the company. The thing is that the account manager has a very real incentive to keep you on side as their pay often depends on customer retention.

              If you lose data in an array outage, you've not planned and tested your DR/data protection properly.

      2. James 100

        Re: Whatever.

        "Definitely. If it's good enough for a home network NAS, it's good enough for multinational industries."

        Seems to work pretty well for Google, Backblaze, Netflix... All the really big public screwups seem to revolve around 'enterprisey' storage going bad, not commodity clusters.

        Having worked in one university and studied at another, the one using commodity hardware clustered gave far, far better results than the big-budget SAN managed, even at multiples of the price. (OK, not colossal installations: a few petabytes of storage, a few gigabits of traffic: enough to run most big businesses, though.) If an architecture is good enough to run Google's business but not yours, you're either doing something very special, or you're doing it wrongly - probably the latter.

    2. Anonymous Coward
      Anonymous Coward

      Re: Whatever.

      Sounds great until it breaks, and the original creator left.

    3. Anonymous Coward
      Anonymous Coward

      Re: Whatever.

      Homegrown software is an utterly stupid idea for a critical system like a storage array. No-one else has contributed to the testing and no-one else can help you with their pre-knowledge of the system, in order to fix it when it goes wrong.

      1. jake Silver badge

        @AC 0846 (was: Re: Whatever.)

        Uh, AC, do you REALLY think that store-bought commercial systems are tested at the ones&zeros level by anyone other than the company selling it to you?

        Me, I do development & quality control in-house. It's cleaner. And cheaper.

        And I hire adults, not kids. Nothing beats butt-in-the-saddle time.

  3. Lusty

    Open your eyes a little

    There are lots of different approaches to storage on the market today. NetApp have two which are completely different under the hood to "traditional" SAN. HP Lefthand, Compellent, Equallogic are all shining examples of completely different architectures, and Violin of course do almost nothing in a traditional way.

    What actually needs to change is the way storage is purchased. many people in the position to buy a large system will go monolithic, making change control harder for the whole department and causing complexity and cost with the main benefit being their CV and ego get a boost having used a very large array.

    Enterprise, for the record, generally means the retail price is doubled (at least) so that you are offered a minimum 50% discount off list to make you feel better about spaffing a million on disk. Enterprise often doesn't offer many or any features or benefits over the smaller competition. But then enterprise storage isn't about features since the storage team only do storage...the backup team does the backup and they wouldn't be allowed near the SAN even if it could do backup :)

  4. Fenton

    My biggest beef with SAN/NAS storage

    It is still far to granular and requires far too much manual configuration.

    I would love to be able create a filesystem that automatically queries the SAN for available space and just allocates it.

    But no I have to go the storage team who will manually assign Luns. Any change that needs to be made then requires either more or new luns to be assigned and if SRDF is involved, a whole heap of extra work.

    1. Anonymous Coward
      Anonymous Coward

      Re: My biggest beef with SAN/NAS storage

      Even if you had just a large pool of JBOD which was presented to a load of servers, you'd need to have the SAN which the disk were attached to designed, you'd need someone to handle provisioning and forecasting. Someone would still need to setup replication and monitor that the whole thing was working correctly. Now that can be done at the OS level - particularly interesting is Windows 2012's new disk subsystem which takes a lot of traditional array functionality into the OS - or it can be handled in a dedicated array. My personal preference is handled in the array because OS guys, in general, aren't storage experts. If you put the storage into the OS, you need to manage and monitor at the OS level, which means far more complex management than a single point at the array.

      1. Roo

        Re: My biggest beef with SAN/NAS storage

        "My personal preference is handled in the array because OS guys, in general, aren't storage experts"

        Presumably "storage guys" aren't OS experts either. Storage is bugger all use of apps aren't using it and *most* apps need an OS, so like it or not I think that you really should understand both sides of the equation in order to provide a good service.

        Also I really don't see why monitoring at "the OS level" has to be any more complicated if it is doing the exact same job. There are less vendors and fewer moving parts involved, so it *should* be simpler really (although Microsoft has an awesome track record of making simple things very hard). Genuinely curious - not trolling. :)

        1. Anonymous Coward
          Anonymous Coward

          Re: My biggest beef with SAN/NAS storage

          Storage guys usually know the storage portion of OSes and can work with OS guys and App guys to design a system appropriately.

          A pure OS guy rarely knows sufficient about both Storage and App in order to design a system appropriately, likewise an App guy rarely knows enough about OS and Storage to design the system appropriately. However I've also found in my time in Storage/Data protection that OS and App guys regularly vastly over-estimate their personal knowledge of Storage/Data protection. That's not to say that storage guys don't overestimate their knowledge of OS, but usually the area of OS knowledge they get involved with is smaller, so easier to know.

          As for monitoring, it's inherently more complicated to monitor, lets say 100 servers, all hooked up via multiple paths to a giant rack of JBOD than it is to monitor a single array. This is because you need to monitor what each server is assigned to, how the servers local disks are performing and how these are interacting with the rest and this has to be done at the server level, ie 100 times, rather than at the array level, ie: once. You also don't have anything like as clear understanding of the performance of the JBOD as you would have of a single array. For example, if you have a path to an array that's overloaded, it will tell you that one of its ports is hot and who's making it hot, whereas if you have a path that's into a bunch of disks which is overloaded, there is no monitoring at the disks, you have to piece together the sources.

          Now I suspect group storage reporting by servers may help here, but it's still very much in its infancy and certainly doesn't work cross platform. My Windows servers, have no idea about the storage on my Linux servers, however an array would have a good understanding of everything hooked up to it.

          1. Roo

            Re: My biggest beef with SAN/NAS storage

            "As for monitoring, it's inherently more complicated to monitor, lets say 100 servers, all hooked up via multiple paths to a giant rack of JBOD than it is to monitor a single array"

            From the way you've phrased that it seems to me as though the reason why it's easier is because with the array option is that you are not actually monitoring the connectivity between the server and the array, only half the job is being done. Arrays *AND* JBODs are functionally useless if apps running on servers can't use them, so in practice storage really needs to be monitored and managed from end to end in order to provide some business value.

            I get pretty sick & tired of banging three sets of heads together (storage/network/sysadmin) just to write a log file to a file system.

            1. Anonymous Coward
              Anonymous Coward

              Re: My biggest beef with SAN/NAS storage

              @Roo - When running an array you'd tend to be running some sort of client software which can talk inband and/or out of band to the array, so you would be monitoring the OS connectivity and the Array at the same time.

      2. Fenton

        Re: My biggest beef with SAN/NAS storage

        But why could the SAN not be setup up front before us App/OS guys come along.

        Then it should be a matter of saying.

        I want X amount of storage with this IOPS requirement for this filesystem and it should be replicated to the other DC.

        The SAN guys should be able to do all of their forecasting in the background.

        If they hit a certain threshold, they should order new disk/Cabinets.

        The array should sort out any hotspots and tuning. In this day and age, it is still takes far to long to get anything done when storage is involved.

        I can deploy over 300 VMs in a weekend, roll out front-end patches to 1000s of users overnight, automatically patch 100s of SAP instances in a weekend, but it still takes days just to make a change to a filesystem due to a requirement change.

        1. Anonymous Coward
          Anonymous Coward

          Re: My biggest beef with SAN/NAS storage

          @Fenton: Initial setup usually requires that you know what the HBA's WWNs are, often array management and san management software makes it quite hard to configure zoning and LUN mapping to devices which don't exist. Now, you should be able to tell your storage team what you need and how many IOPS you need, etc, that should be enough information for them to setup/reserve disk for you. That said, forecasting growth is one thing, forecasting new projects need is another. You don't tend to get much sympathy from an application for spending approval when you're asking for quarter of a million quid for a new array and you don't know if there are any projects that need it yet.

          As for tuning, sometimes you want auto tuning, other times you don't, most enterprise arrays will automatically move hot tracks around, but often you're dealing with siloed spending. If one project purchased a certain amount of disk and a certain performance, they tend to be quite upset if a project is sharing pools with them and taking all their IOPS by having their data moved up to the top tiers, because they badly predicted growth of their system. Mainly these things are political, when you're rolling out VMs or patching, it's on isolated systems, you know how much hardware you've got to use or the OS you're patching has known and well defined owners. Move to the storage tier and you're into a world of shared services which complicates things somewhat.

          1. Roo

            Re: My biggest beef with SAN/NAS storage

            "As for tuning, sometimes you want auto tuning, other times you don't, most enterprise arrays will automatically move hot tracks around, but often you're dealing with siloed spending. If one project purchased a certain amount of disk and a certain performance, they tend to be quite upset if a project is sharing pools with them and taking all their IOPS"

            That pretty much hits the nail on the head. In those cases a big shared array is often NOT a good fit for the business (unfortunately politics counts).

  5. Nate Amsden

    how complicated is your automation ?

    What sorts of things do you do? For me it's mostly snapshots. I have quite a bit of work invested in orchestrating snapshots. My most complex one was recently here is a visio digram:

    http://elreg.nateamsden.com/MySQL%20Snapshot%20diagram%20for%20Staging.png

    Which is co-ordinating a data refresh amongst 26 different systems as a single "process" (where process here means it's all done to achieve a single goal). Basically restoring production MySQL database data to a few different locations for different uses. In order for an environment to properly use the data a bunch of stuff has to be done to the data and to the environment to get it ready for use. You can't just slap a new database in and expect it to work. So there's 8 MySQL databases involved, the data set is about 250G.

    It's by far the most complex thing I've personally done (though really the complexity is not high it's more the number of moving parts is the highest). At least 2,000 lines of scripting involved and the process as a whole is based off something I started back in 2007, so the core of it is quite mature. It runs very reliably.

    Of what is probably 500 commands that are executed as part of this process (hard to say for sure on the spot since I have scripts abstracting other scripts sometimes abstracting another layer of scripting). Of that, I determined how much goes to the storage array itself.

    33 (one of those is optional, I could exclude it and lose no functionality but I like it in there)

    That is 4 commands per MySQL database volume which comes down to:

    - remove VLUN

    - remove volume

    - create volume(snapshot)

    - create VLUN

    (one set of commands for each of the 8 MySQL hosts)

    I use software iSCSI a lot as part of this process due to what I consider a flaw in how vSphere handles raw devices maps in 4.x (and I assume 5.x too). I wrote about it four years ago here:

    http://www.techopsguys.com/2009/08/18/its-not-a-bug-its-a-feature/

    (still using fibre channel as the source volumes, basically ones that don't need to detach and get destroyed on a regular basis)

    There isn't a whole lot else I *need* to automate as the platform is already pretty heavily automated on it's own(it's 3PAR in the event my posting didn't obviously imply that from my history).

    The point is, of course - it SHOULD be trivial for me to adapt my scripts to pretty much any block storage platform as long as it accepts SSH login, since 99.93% (33 out of 500) commands are not dealing with the storage array directly.

    I just checked - and my process has 72 mysql execution commands, so more than double the commands executing SQL code vs running CLI operations on the storage.

    So again I don't know what sort of automation your doing, or what you might NEED to do on a new platform, but you might find it's a lot simpler than you think because perhaps you only need to do a tiny amount of work.

    At least for the legacy vendors they should be able to change the architecture or CLI interface and maintain some level of backwards compatibility. For example CLI changes, they can just have different CLI modes that one can use(and set as default) for a transition period. For architecture you can change the architecture and if people want to still create a bunch of small LUNs to expose to a host (even though there's no point in said new architecture) let them do it, doesn't hurt the system. Or if they want to pin hosts to particular controllers(even if there's no point) let them do it.

    I've never directly scripted on any of the other big storage platforms, I did work with some scripts another person wrote on the EMC Clariion platform in 2003 - 2004, it wasn't pretty.

    Storage is not my primary responsibility - certainly not a task I've ever been assigned full time work for. I think I've learned a good amount of interesting stuff over the years on 3PAR though. I think if storage today still worked like EMC Clariion did back in 2004, I know I would probably not be involved in storage today. I saw the spreadsheets and visios the folks made back then and I said to myself "I'm not getting into storage that is way too complicated to be spending my time on(on top of servers, and networking, security, etc etc)".

    Then I came across 3PAR (and was having them compete with NetApp at the time since I heard similar simplicity things about that platform at the time - the 3PAR rep was actually the EMC rep for the company that had the Clariions that is how he knew of me and it was a cold call process that took 6 months until we decided to get an eval going). NetApp outright refused to give my company an evaluation array (was looking for a small one at the time just a couple shelves). It was really that refusal that drove me to 3PAR in 2006, which turned out to be a good choice in my book. It took me a while to get up to speed on the architecture (thin provisioning, wide striping, chunklets etc) and stuff but once I did a whole new world opened up and it was pretty cool. Before that time I basically thought any dual controller array was the same as another (specifically I recall looking at possibly using Infortrend FC storage as a much cheaper alternative to 3PAR etc). Sooooo glad I did not go that route!!

    1. Nate Amsden

      Re: how complicated is your automation ?

      oops there are 7 hosts directly involved in the snapshot process, so 28 storage commands for those, plus that one optional one is 29, a couple more to manage an intermediate snapshot(see diagram for details), and I have to run but I don't recall what that 32nd command is, not that it's a big deal, obviously the point is the # of commands is small!

    2. Nate Amsden

      Re: how complicated is your automation ?

      ack no sorry the number is 8 hosts, there is another environment that was added recently to this process so that adds an 8th mysql server(and 7 supporting application systems). They are not reflected in the diagram.

  6. bigdata

    That architecture is already here - designed from the ground up by Violin Memory. Take a look at vmem.com Hot swappable, NAS, software optimized down to each strip of NAND flash. Lowest latency of any array.

  7. This post has been deleted by its author

  8. dan1980

    Not that complicated

    The 'new kids on the block' use tried and tested paradigms so they can focus on creating some specific IP that one of the established players will purchase.

    The established players only make incremental changes because - for the large clients they rely on for their revenue - predictability and scalability are more important than getting the absolute best offering.

    Creating new architectures and making radical changes for each new generation will very quickly lose you the confidence of your larger customers, who are buying the big label gear exactly because they are reliable and predictable and have defined road-maps that can be used to plan purchases and equipment life-cycles.

    You want to be able to do phased upgrades and that is made an order of magnitude more complex and expensive if you have to plan and deploy new management tools and architectures. Far simpler to spend that money on a more brute force approach like throwing more flash/DRAM/spindles at the problem.

This topic is closed for new posts.