One basic rule of storage is don’t keep items you need to access quickly in hard-to-get-at places: that’s like keeping instant coffee in a safe. You suit the storage type to the type of item you want to store. One size does not fit all, and valuable data should not be stored on slow-speed disk drives along with ordinary data. …
Disagree with you that flash doesn't make any difference for tiering
It creates absolutely no reason to use anything but SATA drives in the array along with the flash. The difference in speed and capacity between flash and hard drives is so vast compared to the difference between the fastest FC drive and the slowest SATA drive that there is simply no point in using FC drives anymore once you have a flash tier.
Wrong wrong wrong
Using SATA and SSD in a tiered solution, the so-called "flash and trash" argument, is a plain stupid thing to do. The reason for this is simple: the speed of your overall data access is limited by three things:
- speed of your SSD tier
- speed of your HDD tier
- % of active data on each
Let's take a simple example: if your HDD tier consists of 100 SATA drives and so can manage 20,000 IOPS (being generous) and 80% of your active data is on flash then your overall performance will be 100,000 IOPS. Doesn't matter that your SSD tier could push out a few million IOPS given the chance, it isn't given the chance because your overall system is bottlenecked on the SATA. Which is an incredible waste of both money (that SSD doesn't come cheap) and potential performance (100K IOPS from a solution with lots of SSD? How embarrassing).
To increase the overall throughput you need to up one or more of the above factors. Given that the SSD is not the bottleneck and the % of active data on the top tier is what it is (for a given algorithm and SSD/HDD ratio) your best bang for the buck will be to have a tier of enterprise-class HDDs that are big enough to hold most of the last 10-20% of active data that doesn't fit in to the SSDs and fast enough to allow the SSDs to really throw out the IOPS.
Fast HDDs will continue to play an important part in storage, tiered or otherwise, until anything not on SSD is considered nearline and treated as such; not just a separate tier but a separate access mechanism.
But if your SSD holds data that is accessed 99.9% of the time and the SATA holds older documents accessed, say, 0.1% of the time - i.e. it's tiered then most of the time the system will be working off the SSDs and the SATA is in a quasi archive state.
It is dependant on the amount of time/how often that data is needed not the overall amount of it. That is why HDD Caching worked so well when it hit larger capacities with decent algorithms despite it being a fraction of the size of the disk itself.
Scooby and I are confused.
"Arruugh" - rough translation "I don't understand."
You're right Scooby, Something's fishy about this post. In order to even approximate the numbers Storage_person is proposing in his example, you would have to have a incredibly uniformly distributed workload (unlikely) and one such that accesing off-SSD data actually slowed down the SSD throughput, which is not consistent with the specs given and in real life. an obvious solution would be to put such terribly critical data on the SSD instead of something less vital. If so uniform, I still can't see how the realizable IOPS should be less then 110,000-120,000 IOPS system-wide. Now, using faster disks for that 20% of un-SSDed data would certainly juice things up, but in this day and age increasing the size of the SSD that 20% is not much more, if any more expensive and would result in a VASTLY faster achievable data rate with way better response time. Another possibility is better algorithms from your vendor which better allocated the data to the faster and slower tiers.
Scooby, this is not to say that this sort of thing doesn't happen in real life, however, I suggest looking at things like insufficient/unbalanced IO channels or not enough processing power in the storage unit(s).. Now, enjoy your bone!
And the point of the article is?
It gives nothing to an enterprise admin who already understands tiering, and there's nothing there to help a struggling admin trying to sort their storage nightmare.
Or is it just an advert for Dell who have a link?
No advert for Dell. Tell us how you think the article could be more useful. And then we will have a rethink.
So Hierarchical Storage Management is getting sexy (again) ?
Maybe someone should tell the sysprogs.
Du-dupe my blocks and call me Susan.
So your objection to the article is that people that already know this has no value of reading it? Go figure.
Mine is the one with the dictionary in it. I am about to throw it in the bin since I already know most of the words in it.
From what I understand (don't do much storage myself, I usually have someone else handle that) the IBM Storwize does this kind of thing too. You throw in a mix of SSD and HDD, and it will put the most frequently accessed data on the SSD. When it gets old, it moves it off the SSD and puts the newer, more frequently accessed stuff on there.
You can't mix 2.5" and 3.5" disks in an enclosure, but you can mix enclosures in a system. So you can (theoretically) use the 3.5" large slower disks as your cold disks in the "hot, warm, cold" strategy. Whether it is intelligent enough to recognise that ability, or it only knows "SSD fast, HDD slow" I don't know, but its worth a look.
Spend 1000's to save 100's
sure time is money, but implementing such a tiering may be time consuming and the horsepower and brains to do that costs money as well. Plus your DB may be locked because some old fart is updating their eerily old data on a very slow cassette tape.
Add that moving data back and forth may well end in losing it to a "I did not consider that potential failure" ...
I'm sure adding a faster disk array to an existing storage pool when filled up is more efficient ... so just dimension your storage for a 6 month period so you can enjoy the benefits of newer and faster tech for your newer data ...
Speed vs Resiliance
Firstly seeing hot HR data mentioned made me laugh. Storage gets down to cost no matter. How much would it cost for this data to be a little slower, how much would it cost if this data wasn't there at all and how much if this data wasn't around for 5 minutes, half an hour, an hour, 3 hours, half a day.... Then you can start to begin about planning your needs based on your needs to get the requirments for your business. Does the HR system need to be in the same storage array - well if it has the capacity but a isolated smaller system in the same datacenter may be better and isolate that data compeletely from other data that has no need to be in the same area. Call it data health and safty. Yes I know it should be an issue but it does add to your exposure with regards to system usage etc as such adding to potentual issues beyond what you realy need. Isolation is also good security practice as well as ensuring intergrity. It also allows you to accomodate resiliance and needs based on the data and usage as apposed to one shoe fits all as every department will say there data is business critical and of the utmost importance but try getting everybody to look out the windows of your entire company and describe the same thing, they all have there own viewpoint. you have to look at the numbers of actual buiness cost based upon external income impact and not that some HR director things 1 minute of there time is worth a million to the company as without him they would have no staff, arragance takes many forms remember but this is were accountants can have a positive use in actualy having to do some useful work and get you those real world figures department heads are mindfuly blind too in some respect. They manage departments not the whole company so you need to scale them with accountant numbers to see what there data is realy worth.
You can then plan if anything needs to be data warehoused and backup frequency/policy and failover access to the data etc.
What people need and what they want are always wrong and if there right then you have found somebody you can respect, so good luck.
Like rationalising a database, rationalising your data needs goes along way in seeing what you realy need from a business perspective. Just becasue you can, dosn't always mean you should. Also having the fastest discs in the land means nothing if you have a poor network that supports that data. So that real expensive storage array with automated backup may be fine, but if the network upgrade is due next year then you realy did get things in the wrong order. This is why isolation helps in many ways more than you think. Ever had marketing get a new HD camera and upload 1000x more data than they do in a year in the space of a day impacting main data storage access as it filled up the network and casued a router to overheat/fail. Well best avoided; Only allow people to shoot themselfs in the foot as much as possible as they are terrible shots and will end up shooting somebody else's foot instead and without even knowing they have.
Treat your data like Gold, your users like idiots and plan around that as much as possible bearing in mind there is alot of fools gold out there ;).
And the Cloud it not the answer
to life the universe and everything let alone Data Aging/Hierarchical Storage Managment.
Just before someone says it is.
…that’s like keeping instant coffee in a safe
You're quite right there … I have a much better place for instant coffee… the bin.
The bin IS quickest-access storage in the house.
Where cupboard requires first to open door then reach in, the pedal-bin is foot-operated so opens while reaching in... shaves off fractions of a second!
Completely forgot another approach; that of NetApp. Put everything possible, as far as appropriate, onto SATA disks and then use cache in the controllers to bring in the metadata and data itself. Data that only periodically needs to have a high write rate as well as a high read rate can be dynamically moved up to SAS or even SSD for the period of high use. Once the "batch" (insert your definition or workload profile word here) has been done the now dormant (ish) data is shipped back to cheap-and-deep.
Not sure about the 10K cool tier
The cost/density difference of 10K disks versus 15K doesn't really make sense. If your Hot + Cool is too large to fit in flash and 15K disks work out cheaper for you then Flash + 15k + 7k seems much more sensible. I think the nature of a lot of people's data would indeed suit a Flash & Trash (assuming plenty of the former and that the cold data really is quite cold).
sata vs fc drives
15k vs 5400rpm doesn't really buy you much on random access. The seek time is only a few percent different and that's the dominating factor (which is why fast spinning arrays all use some form of shortstroking to maximise IO). For sequential access all you need to do is add more sata disks to your arrays.
As others have pointed out, the trick is to cache all your metadata to ssd or memory and hold the 99% accessed stuff in ssd too - that's a matter of appropriate sizing.
Bumping our main fileswervers from 4Gb to 16Gb ram made them behave a bit better. Bumping them to 48Gb gave a 100x speedup simply due to being able to cache all metatdata. I'm looking at 512Gb for the next iteration and will add "suitable" amounts of ssd between them and the 200Tb disk (each) they currently handle.
Anyone wanting to roll their own should take a serious look at ZFS.
(And anyone wanting to use clustered filesystem systems such as GFS should be aware that the performance is generally disgusting (bitter experience) - and it won't get better until someone comes up with better ways of passing messages between servers (Ethernet is just "too slow" - even 10Ge).