Many years ago, as an entry-level systems programmer, I decided there were two teams that I was never going to join: the test team and the storage team - because they were boring.* A fellow blogger has a habit of referring to storage as snorage and I suspect that is the attitude of many. So why do I keep doing storage? Well, …
"If you have ever received more chequebooks in the post from a certain retail bank, I can only apologise."
For so many years, I've been after the guy responsible for sending that parcel of 2^32-1 chequebooks from my bank.
That's because he doesn't have to think about how much it costs.
Why are there so many mid-range Ford Focuses and Fiestas on the roads? How dull the owners must be. Surely they could just buy Aston Martins or Ferraris. They're much more exciting cars to drive.
While I salute you for not blaming it all on Bill Gates for once, your Torvalds dick-sucking is out of place here and adds nothing. The guy's good at Linux kernels. That doesn't mean he can run a large company's IT infrastructure.
So what's *your* opinion of storage, Eadon?
Eadon must be felling unwell - not only did he miss the chance to blame microsoft but he didn't end with something along the lines of
ATTENTION-GRABBING SPINNING RUST PLATTER FAIL
"your Torvalds dick-sucking is out of place here and adds nothing."
Oh, when he wrote "Even Torvalds is bored by storage"... I read it as "Even [someone as boring as Torvalds] is bored by storage"
Torvalds is bored by storage
Yes, and Torvalds also thinks that virtualization is a dead end and nothing to bet on. Not the best judgement, eh?
Re: Torvalds is bored by storage
Nobody can be 100% right. If more people realised this then there would be far less idiocy in the world.
On the other hand that would be very boring...
But cloud computing is making storage invisible. As storage (and computing power) become commodities, the interesting action moves higher up the stack.
Then again I'm a developer, so I would say that....
Unfortunately clouds aren't magic. Underneath the cloud we have the same old systems we always had. If you ask for a new VM to be provisioned for a production database that demands low latency, and your cloud guy puts it on 7.2K spinning disk, along with 200 test/dev systems and a backup staging pool, you're likely to get a bit annoyed.
There are two things you need from storage: capacity and performance. The former generally comes from the slow, 3.5" X TB disks. The latter nowadays comes from flash, whether it's solid state drives, direct-attached or SAN-attached flash. Flash isn't cheap, but used in the correct places, the cost is offset by the reduction in disk wait times. If the application isn't waiting for I/O, it can get on with what it wants to do, requiring fewer CPU cores, and often fewer servers, fewer licences, etc.
If you just want to dump some data somewhere though, any old cheap storage will do.
<quote>If you just want to dump some data somewhere though, any old cheap storage will do.</quote>
Uh, no. Reality sucks, Mr pPPPP. Bit if you just want to lose your data, any old cheap storage will do.
Yes, any old cheap storage will do. Everything has its value. If you value your data then you protect it. If you don't then don't. Buying cheap storage and combining it with decent protection is better than paying more and not doing so. I've never used raid in my laptop and never have in any of my desktop PCs. Anything important is backed up. If it dies, it dies and I rebuild.
It's not about if you want to lose your data. It's about what the consequences are of you losing the data. If they're not serious then why bother spending the money? If they are, then you'd be a fool not to do what needs to be done.
I agree. Storage is where you put logs and archives and stuff that the application doesn't need to run. The storage world has proven not to be capable of scaling up its performance fast enough. If I need my data to be safe, I crank up my replication factor and distribute it further around the world - but all in memory. Getting it back from a disk is too slow, even if the disk is physically attached to the machine running the (server vm that's running the application vm that's running the) application, and most of the time it's not - I'm lucky if it's in the same rack.
I'm not sure storage is boring - I just don't see what it's for.
No, it really is boring. Honestly.
Storage has been the bane of my existence for some years.
I've been working in data warehousing (bringing intelligence to the business ...) for some years now and I've seen at least half a dozen sites where the infrastructure folks just didn't get storage. When combined with the the 'every problem is a nail' mentality that consolidation environments foster it leads to a great deal of pain.
First example: A couple of sites trying to shoehorn 1TB+ data warehouse applications onto consolidation environments - and wondering why the projected runtimes wouldn't fit into the batch window. In one case they were moving the system off the original crusty old 32 bit hardware that it had originally been built on using SQL Server 2000 - machinery that was the better part of a decade old. Their new *cough*NetApp*cough* SAN cost them a cool half a million and was outperformed comfortably by an antedelluvian direct attach SCSI array.
Second example: On more than one occasion I've been able to demonstrate the same ETL process running significantly faster on a desktop PC than the production server we were supposed to be deploying to (in one case half the runtime).
Third example: I got to know a couple of sales reps from a large storage vendor I used to work in relative proximity to. Off the record they would quite happily say that a lot of their DW customers used direct attach storage because it just wasn't feasible to get the performance out of a SAN.
One of the MS fast track data warehouse papers manages to obliquely refer to this, saying that an improperly tuned SAN (i.e. one tuned for a general purpose workload) is likely to need 2-3x the number of disks to achieve the same performance as one tuned for the application.
In an application domain where your canonical query is a single table scan, all the caching and tiered storage in the world get you no benefit when your workload has no temporal locality of reference. Direct attach storage is cheap and fast. 99% of Data warehouses don't need high availability. Put a HBA on your server and back the DBs up on the SAN.
It's a relatively straightforward concept - really it shouldn't take Einstein to work this out.
I could go on ...
Re: Storage has been the bane of my existence for some years.
Agreement here- Another bottleneck that a lot of people don't think about is the connection from the storage appliance with xx TB of storage to the servers that access it. You can have a petabyte of storage configured for nothing but IOPS and more IOPS, but it's all todger waving if the network connectivity isn't there.
Work's test lab has a tidy little branch location style SAN filer with ~20 TB of storage, but only 2 gigE network interfaces. Running two separate AD forests, three SQL servers, and a couple apps from a pair of ESX boxen is painfully slow, because of that.
Compared to the production environment (dual mid-level filer heads with quad 10 GbE going to a 7 node esx cluster, with each node fitted with a dual port 10 GbE card) is night and day, performance wise.
Anon to protect that thing they call a paycheque.
Re: Storage has been the bane of my existence for some years.
You've got it spot on. I work for a large storage vendor and I'd agree 100%. You only need storage on a shared system if you need to make use of that system's features, primarily copy services. If you don't need them then put the storage as close to the CPU as possible.
The only ways to make SAN-attached storage comparable is to use flash storage, where seek times become less relevant, and latency in general is much-reduced, or to make sure your application runs many tasks simultaneously, so that there is little or no CPU wait time. This doesn't work well for data warehousing, where operations tend to rely on the previous operation to complete.
The biggest problem with shared storage is contention. This is masked a little by caching, but is made worse by the fact that disk drives are gaining in capacity much more quickly than they are gaining in performance. And those who buy storage still think in terms of capacity rather than performance.
You mention temporal locality and this is also something that storage admins tend to ignore. Thin provisioning really does help here, as does flash, but in the former case in particular, once you have several volumes with high I/O utilisation on the same physical disks, performance suffers. A few years back I dealt with a customer who was recording hundred of CCTV images onto a low-end SAN storage system. Of course, the workload is all sequential, so SATA drives seemed to fit the bill. Whichever idiot sized it didn't think that several hundred streams of simultaneous sequential I/O would produce random I/O with the worst seek times you could possibly have. Thin provisioning solved this problem.
One of the best ways you can improve performance is to mirror between flash and HDDs. Use LVM on the host. All writes go to both disks, and the reads come from the flash. Will work a treat with data warehousing, and you have data protection to boot. Mirror on the SAN storage system and you have copy services too, but you need to think about latency a little bit more, so get a system that fits the bill.
It's the laughing at people who think storage is just storage, and that the more the "solution" costs, the better it will be, that keeps the job interesting for me.
Re: Storage has been the bane of my existence for some years.
Storage is one area I started diving into back in 2006, so I am still sort of new. I saw a similar experience to yours a few years back, I went to a company and their SQL Server databases were running off a BlueArc NAS - CIFS. SQL Server running on top of CIFS (BlueArc could not to iSCSI at that time I guess). They were probably the only one in the world doing it, Microsoft even wanted to do a case study(before I got there). They ended up not doing the case study.
But the point is they had literally 140x10k RPM disks for databases that would run no more than 500-1000 IOPS between them(at peak, on average IOPS was near 0). They needed the disks for the low latency, if CIFS latency spiked SQL server would crash. It was a dedicated rack of storage for this stuff for what could of fit in a few rack units. They wanted low latency yet they chose RAID 5 12+1 on all of their RAID groups.
I've really enjoyed my own experience with storage over the past several years, the bulk of my work has been on the 3PAR platform, I've done stuff on a few other platforms as well. I don't use much of the fancy software 3PAR provides (don't need it).
Most recently there was a request to refresh databases from production weekly to a few test environments(7 DB servers total), so using snapshots and a lot of scripting that I have built up over the years I adapted some scripts to do this, probably about 400 steps for the entire process (99% of which were executed against Linux or MySQL - 1% executed against 3PAR). Nothing overly complicated, just a lot of dependencies to manage.
End result is they can get their fresh copy of production data in about 2 hours (tons of SQL updates need to run post snapshot otherwise it'd be a few minutes), instead of the ~day it used to take(most of that time waiting on SQL). And I can refresh 11 databases in roughly those same 2 hours (I could add more DBs if needed, additional requests have not been made), with minimal I/O hit to the storage and trivial amount of disk space required(since only changes are stored). Since they have committed to a weekly refresh, I don't have to be concerned about the snapshot blowing up in size because someone didn't refresh it in 2 months and it's tracking all of the production changes + the test environment changes and storing them for a long period of time.
The MySQL DBA we used to have wanted to use xtrabackup to export the databases and move them around, moving hundreds of gigs of data around is costly from I/O and disk space & time requirements. His solution was get faster storage. Mine was to work smarter. He left a long time ago.
A co-worker I was just chatting with earlier was blown away when I told him I can "copy" the entire production db for this environment in about 1 second. To most people a snapshot isn't a snapshot. To most people a snapshot is a snapshot in time, a full copy of the data (amazon RDS works like this, am I glad I haven't interfaced with that in a while what a POS). They don't understand most enterprise arrays for many many years now have been able to take real snapshots which are instant, and only store the changes, for great data flexibility and low cost of ownership.
There's not a single application test environment in my infrastructure (and there are a lot of environments, they keep requesting more) that is not running the MySQL instance as a snapshot. It just makes life so much easier -- though there is the initial hurdle of the scripting to integrate with the systems. Most vendors make integration software for Oracle/SQL server etc (I wrote my own for Oracle many years ago on 3PAR because their tools weren't flexible enough for my needs), but none offer anything for MySQL as far as I can see.
For our environment our broken application uses some in memory tables in MySQL so I have to work around the fact that replication fails on any slave if MySQL is restarted, because MySQL doesn't preserve that data between restarts(how dumb is that). Fortunately none of the data stored in memory tables is important, it is transient, but it still requires replication to be repaired post restart(which the scripts handle automatically)
Nothing I haven't been doing for years, this involved a much greater level of integration though co-ordinating between roughly 35 different servers(11 are database servers)+storage to make it happen. It was sort of a fun project. I'm still working out some of the kinks, at some point soon it will just be a set of cron jobs.
Most of these developers who don't know about storage I don't expect them to change, their answer to everything, much like the aforementioned DBA is "make it faster! I don't care what it costs I'm not paying for it!" Thus resulting in massive amounts of over spending and low capacity utilization. (e.g. waste for you developer types). Fortunately the developers I work with are more thoughtful.
Wha' ... huh? Platters you say? .... Zzzzz...
I want storage to be boring
Like I want internet connectivity to be boring.
The last thing I want is for anything my works relies on to be at the white heat of the cutting edge of technological metaphor
And you are proving that it's not boring by telling me I have to look into it, and when I do it's a bit less boring?
That is Boring,
Bring on the Security guys please.
Badly written application code can have a significant impact on storage response. I once fixed some code that jumped around reading short bursts of bytes from a file before modifying them and rewriting them. I changed it to read a whole block of data in one go, modified the required data in memory, then wrote the data block back out. Two block transfers instead of more than a dozen jumps back and forwards killing the filing cache.
- Breaking news: Google exec in terrifying SKY PLUNGE DRAMA
- Geek's Guide to Britain Kingston's aviation empire: From industry firsts to Airfix heroes
- Analysis Happy 2nd birthday, Windows 8 and Surface: Anatomy of a disaster
- Google chief Larry Page gives Sundar Pichai keys to the kingdom
- Adobe spies on readers: EVERY DRM page turn leaked to base over SSL