37 posts • joined 6 Jan 2011
From The Other Side
(Not that I'm on the other side, but...)
I build a storage product that's software-based, easy to use and has some features which will save you significant money over a time period of a few years, with ongoing savings.
You want it to have every feature that your existing product has, regardless if you use it or not. You expect me to have a
You love the idea of it being software-based, but want a certified hardware setup on which to run it. More specifically, you want me to certify it on your own particular hardware, at my own cost, to prove that it works. You expect me to keep up to date with the changes that you make to your own hardware to retain certified. You won't listen when I tell you that purchasing desktop-grade drives and expecting them to run 24x7 with good performance and low levels of failure is unrealistic because I "should stick to the software".
Given that I have certified the hardware, any problem which occurs on it is now my responsibility to find and fix even if it comes down to issues with the hardware configuration, build or implementation. And that is before you put your own customised version of an operating system on it and expect my software to run on it without issues. Or change your hardware spec without telling me, or put my software on an underpowered server "because it was all we had available", or...
So now I am a software company with lots of hardware to support your business, multiple versions of hardware, operating system versions and software to test every time I make a change, staff that are knowledgeable in hardware as well as software just so that they can find your hardware issues when they occur, and a lot of overheads.
You want 24x7 "enterprise-grade support", even though in my company the people who build the product are second line support and not only understand the software but know that fixing the issues are required for the company to survive. With your existing solution unless you have spent over $100MM you don't get anything other than people who follow "support process flowcharts" and are bombarded with customer satisfaction surveys after every call, and you passionately hate the support service they provide.
And because it's "just software", you don't expect to pay any significant amount of money for my product and forget the time and money that was spent building the software in the first place, the fact that in addition to the engineers I need to provide support staff, offices, labs, salespeople, and all the rest. Figures that never made it in to your BOM calculation, that's for sure.
Thinking of which, what the hell were you doing building your own BOM model in the first place? Why do you care so much about the nuts and bolts of the solution rather than the final cost to you and what it can give you? Why are you so upset that we might both be able to benefit from the solution that I am presenting?
And at the end of it all it turns out that you don't want to be disruptive. You just want to be cheaper. But you're too concerned about the unknowns that you go back to your existing big vendor, get a couple of points off their current price, and carry on as you did before.
Misuse of Drives
Using desktop drives in a 24/7 environment will kill them. Some might fare better than others, but frankly if you put your (or your customers') data on desktop drives then you're just asking for trouble.
And yes RAID can mitigate the data loss but it also exacerbates the failures as drives in a RAID array have a lot more work to do than those which are standalone.
So given that the drives are being misused I'm not sure what use these numbers are (although all the values calculated to 4sf are very nice to look at).
The issue of storage QoS is even more complex than the article suggests. Moving an entire host on to a specific tier of storage is not the correct solution, because most servers have a relatively small amount of active data, and a large amount of inactive data, obviously with different I/O requirements. So there needs to be the ability to identify and migrate the hot data on to flash whilst retaining the cold data on HDD. Plus of course the definition of hot data changes frequently over the work day, the business month, the calendar year, etc.
And defining requirements in terms of IOpS is not a good solution either. Even a slow disk can provide thousands of IOpS if the application requires streaming whereas the fastest HDD can be defeated by a weird access pattern. Flash has a major benefit here for reads, but you can craft write access patterns that can bring flash drives to their knees as well. Response time is a nicer way to think about things, but that would require storage companies to work with OS/database/app companies and they're notorious for having no desire to do so.
Finally, the idea of having lots of policies around to define relative priority of servers is too time-consuming and error-prone to be worthwhile. It needs to be a totally automated system, dynamically moving data from one place to another to ensure that the the right data is on the right tier at the right time and without operator-created constraints. This is not the type of functionality that someone will purchase as an add-on or third-party feature to existing arrays, it's going to need a new product (or company) which uses this as the basis for their sales pitch. Kind-of like Compellent, but much more dynamic and adaptable and without the manual configuration.
That'll be ex-Intel CTO Pat Gelsinger defending Intel then.
Interesting that Google announced that they're looking at a particular technology; didn't think they did that type of thing unless they were already set on using it. No idea what this means, but curious nevertheless.
Re: Never going to work
"Internal market prices/wages should NOT change in response to the external exchange rate"
I used to have a salary denominated in USD and paid in GBP; changed every month and sometimes (through 2008-2010) a lot. Stop thinking of Bitcoin as something directly related to USD and think of it more as a separate currency.
There are no bankers of Bitcoin, that's the whole point of it being a decentralised system.
Trading in and out of Bitcoin with other currencies will attract a fee, as would buying gold or dollars with pounds, but in my opinion that's fair enough. Once you own bitcoins then transactions are easy and free.
Re: Hangouts for SMS
Handcent SMS works nicely for me.
Many, many years ago (late 90s) Solaris could only have a maximum of 5 partitions per disk (actually 7 but two were reserved). And Sybase could only a maximum of 2GB per partition. This resulted in a 9GB LUN being perfect: 4x2GB partitions for data and 1x1GB partition for binaries, logs and the like. The world has moved on, but many DBAs haven't.
Re: Long overdue
Moving from any MySQL-based database to PostgreSQL as a default would be good for everyone. PostgreSQL is both awesome in terms of features and an actual real database built on actual real database principles. The more people that use it for their bigger/more important projects the better.
For the smaller just-need-a-structured-datastore needs just use SQLite.
And the size?
A quick look at the SPC-1 report shows that the total available storage (prior to losing any through formatting it with a filesystem) is a smidge over 1TB. So: great for speed, not so hot for actually storing anything of any decent size.
Re: Actually, yes...
I think you might be looking at the wrong people to move this forward. My guess would be that you're more likely to see it being pushed by the software vendors. Microsoft and Oracle spring to mind, but for any software vendor who can build a VM which contains an image based on the storage vendor's specification and gains high availability/clustering/failover for pretty much free it's a very interesting proposition.
Quite surprised that no storage vendor has taken that first step, to be honest. And I doubt that it's the type of thing that a startup would want to take on, as they would need to work pretty hard to get the larger software vendors on board. So perhaps it will just not happen through plain lack of drive from the bigger storage vendors; wouldn't be the first time...
I think that servers-in-arrays does have a place, and that place is where there are relatively low requirements in terms of numbers of servers but relatively high requirements in terms of availability.
Take a simple case of a small business that wants to run a couple of servers. Say Exchange and SQLServer, for example. Any highly available implementation of this is going to involve a fair amount of complexity in both physical and logical configuration. But storage arrays have a lot of high availability functions built in to them and so handling the 'server' side of things (which would basically be a pre-configured VM running on a lightweight hypervisor inside the storage array) would be relatively painless.
Given the complexity (and cost) of building out some sort of SAN configuration with multiple servers and storage arrays I can see this being a viable option for smaller companies, certainly. As to if larger shops or service providers would be interested in them, I think it would come down very simply to if they proved to be cheaper to manage and more reliable than the build-your-own variety.
Valuation Vs Revenue
So what do their revenue numbers look like? A $2B market cap is pretty big and needs to be justified somehow. Violin is one of the earlier SSD companies, with not a lot to talk about technology-wise except for "it's flash". At least with companies such as SolidFire or Actifio there's a software layer there that adds value (in theory; not wanting to argue the specifics), but I don't see anything in Violin that you couldn't do with a JBOD enclosure and a linux stack with software RAID.
Of course, the last two flash-type acquisitions were massively overpriced as well so perhaps it's just me that's missing the point here.
Re: The simple, elegant solution...
If you have enough RAM to store all of your data then you don't even need flash at the backend, just use disk. Streaming all of your memory to hard drives is not significantly slower than flash, and is certainly possible to do with a few supercaps.
In my opinion the biggest change in storage in the near-term is the one that moves high-end storage from being centralised to localised again. When there is a software layer that can handle dispersing data intelligently around the available infrastructure, ensuring availability and protection as well as good enough performance, *that* is when the storage landscape changes. Right now there really isn't a lot of difference between your low-end HDDs and high-end SSDs beyond $/GB and $/IOPS.
Nothing to do with Modern Workloads
“Magnetic disks are rapidly starting to exhibit tape-like properties and with modern workloads being increasingly random, they are becoming less and less suitable as a storage system.”
It's not modern workloads, it's the size of hard disks. As a hard disk gets bigger the amount of data accessed over a given time period, as a % of the total disk, either drops to where the access patterns look like archive storage or stays the same and hits the IOPS limit of the hard disk.
In fact HDDs are a fantastic storage system, just not so good at random access high IOPS storage.
As to if modern workloads are any different from previous workloads in terms of data access patterns, it's more of a case of how well the database is designed by the developer than the type of database itself.
All in all a pretty poor way to introduce a very useful addition to the EC2 family.
Re: @Dazed and Confused
"...when an SSD dies, there is no recovery."
That's a stupid comment; of course when it dies there is no recovery. Exactly the same as HDDs. Except that SSDs already have built-in redundancy (basically a combination of internal RAID and over-allocation of capacity) and no moving parts so have fewer chances of both individual and systematic failure.
Are they perfect? No. But are they at the stage where they are suitable, both reliability- and cost-wise, for primary storage for the vast majority of uses? Absolutely.
Re: Idiot managers should cut prices and release huge disks now!
They were talking about storage arrays, not single drives. The issues in Thailand might have put them 6 or so months behind their original schedule but they're not holding back on a ~10x increase in capacity for a lark.
in the meantime my exchange was due to be upgraded to FTTC in June right up until yesterday, when it magically slipped to September. Some realistic scheduling would be great.
Location, Location, Location
One thing that hasn't been mentioned is that the location of the wireless access point/router is critical for achieving good coverage. The basic idea is that it should be in the middle of the house, which may not seem like anything other than the obvious but I've seen many instances where people have their wireless router tucked away in the corner of a room on or near the floor and then wonder why reception in their bedroom isn't all that.
If you have a wireless router sitting low down in a downstairs room then try putting it on a high-up shelf. Also, if you have three antennae on the router then place them so they look like \ | / (with the outside ones at something like 45 degrees) rather than them all facing up. Between them these two simple changes can make the difference between no/patchy signal upstairs and a nice strong wireless connection everywhere in the house.
@Rob If you do a google search for either "dado trunking" or "skirting trunking" you'll find lots of options out there.
If you want to run cable outside then you need to look for "external grade" cable. Won't be cheap, though. You may want to think about running CAT5e instead (unless you really need that 10G connection of course...)
Millions of Internet Users?
These millions of internet users, they'll be the ones who don't have an email address under the current TLDs but will immediately use the new TLDs when they become available?
And with the costs involved in picking up a new TLD I can't imagine that these .coke and whatever addresses will be used for anything more than microsites and similar web-based marketing attempts for massive companies that already have very well-established internet presences.
I'm not seeing this as being anywhere near the issue that people are talking about...
Wrong wrong wrong
Using SATA and SSD in a tiered solution, the so-called "flash and trash" argument, is a plain stupid thing to do. The reason for this is simple: the speed of your overall data access is limited by three things:
- speed of your SSD tier
- speed of your HDD tier
- % of active data on each
Let's take a simple example: if your HDD tier consists of 100 SATA drives and so can manage 20,000 IOPS (being generous) and 80% of your active data is on flash then your overall performance will be 100,000 IOPS. Doesn't matter that your SSD tier could push out a few million IOPS given the chance, it isn't given the chance because your overall system is bottlenecked on the SATA. Which is an incredible waste of both money (that SSD doesn't come cheap) and potential performance (100K IOPS from a solution with lots of SSD? How embarrassing).
To increase the overall throughput you need to up one or more of the above factors. Given that the SSD is not the bottleneck and the % of active data on the top tier is what it is (for a given algorithm and SSD/HDD ratio) your best bang for the buck will be to have a tier of enterprise-class HDDs that are big enough to hold most of the last 10-20% of active data that doesn't fit in to the SSDs and fast enough to allow the SSDs to really throw out the IOPS.
Fast HDDs will continue to play an important part in storage, tiered or otherwise, until anything not on SSD is considered nearline and treated as such; not just a separate tier but a separate access mechanism.
You expect the OS to be able to handle losing access to drivers, pipes, swathes of virtual memory (assuming that there is a pagefile out there) etc. and carry on working? To what purpose?
In this type of situation, with perfect error recovery, chances are the OS would be running but totally non-functional as everything it would try would result in an error. In reality everything would just crash.
You want to try it? Put any OS of your choice on a USB drive, boot it, do some work, then pull the USB drive. See how much more work you can get done before the whole system crashes and burns.
Apples to Apples
So if a 512 byte read needs to read 4K from the underlying storage can you explain exactly why comparing the number of 512 byte reads to the number of 4K reads is such a bad comparison? Sounds like they are very much apples to apples to me.
It is Cheaper...
...than IBM. But the IBM system is a classic example of a benchmarking product built for a big headline number but not something that anyone would purchase in the real world. Their costs for capacity and performance are both stupidly high.
I can see a place for the TMS unit where ultra-high performance over a relatively small amount of storage is required, but given it is still coming in at up to 5x the cost of capacity of other SPC-1 results it isn't exactly what you would call general-purpose storage. Saying that it is cheaper than IBM for both capacity and performance is no more of an accolade than "not in a position to stiff customers who have no idea what they are buying but know it needs to be blue".
Pity the Performance
It might seem like a nice idea to have everything as a virtualized stack but how many context switches is an I/O going to need as it passes through multiple virtual machines on the same CPU to get from "server" to "storage"? There's only so much CPU power you can use to compensate for this type of massively inefficient design, and in a world where SSD-level performance is expected it just isn't going to be able to keep up.
Flash-based Array Price/Performance
"The SPC-1 rankings are going to be dominated by flash systems, both in a performance sense and in a price/performance sense with outrageously impressive advances in price/performance by flash systems"
I'm unclear on where these outrageously impressive advances in price/performance will come from with flash systems. Two reasons for this: first, flash as a technology doesn't have too much further to go. There are other technologies coming down the line but don't hold your breath for flash to get significantly bigger or cheaper any time soon, certainly in a way that supercedes that of disk.
Second, many arrays are unable to handle the sheer number of I/Os that SSDs can generate, so there will be diminishing returns with these products. Scale-out products will help the performance aspect but unless there is a move away from the "big controller and lots of disk shelves" model the costs will be prohibitive thanks to the "big controller and a few SSDs" model that will result.
"Switching to 2.5-inch drives just delays the onset of the problem. A 3TB 2.5-inch drive will have the same interminable RAID-rebuild times as a 3.5-inch one."
Well kind-of. Right now the biggest 2.5" HDD is 900GB and the biggest 3.5" HDD is 3TB. If you assume that the 2.5" HDDs can get to 1.5TB in the near future then that's probably the transition point as you can put 2x2.5" HDDs in place of the 3.5" HDD for the same capacity but higher rotational speed (10Krpm Vs 7.2Krpm) and significantly faster rebuild time (1.5TB@10Krpm Vs. 3TB@7.2Krpm). And rebuild times aren't helped by hybrid drives, so no luck there.
The I/O density problem is massive, and is helped by hybrid technology, but only if done right. Chances are that this will have to happen in the array rather than the drive; a single drive is (ironically given the article) not large enough to provide a set amount of cache and expect that cache to be used effectively.
So SSDs for the desktop, hybrid storage arrays for businesses.
No idea what you mean by "meanwhile withdrawn" for the Xiotech result, I don't see anything to note that on the SPC website. But anyway...
Your point was the $/IOPS, not total IOPS, but as you're talking absolutes let's go there. You can imagine putting 12x of the Xiotech boxes next to each other to obtain the same total IOPS at the same $/IOPS number (and hence a better overall $ number), but what about the other way?
One of the points I made was that it is good for systems to be able to scale down. A common refrain in the storage industry (not picking on you personally here) is that the bigger arrays are more efficient, are cheaper in terms of $/IOPS and $/GB, etc. This is why people were convinced to lay down half a million per array in the first place. If modular arrays are not only cheaper to purchase due to smaller sizes but have better relative metrics then about the only thing left for larger arrays is that they are easier to manage.
That in itself is arguable with modern technologies to manage multiple arrays as a single entity, but even giving that to larger arrays I'm not sure that is going to be enough to convince people that they need to continue purchasing larger arrays when smaller ones start to make more sense on every metric.
Xiotech for one has posted SPC-1 $/IOPS that are significantly lower than those posted by the ETERNUS DX440 so no, I'm afraid that Fujitsu is not the leader.
And although it seems unfair to critcise a performance-centric benchmark for being one-dimensional there are many other factors that should be taken in to account in addition to $/IOPS. $/GB is an obvious one, and actually SPC-1 is a great place to obtain real-world comparisons as they break down where the storage is 'wasted' (metadata, RAID, sparing, etc.). Others include IOPS/U and GB/U for density, IOPS/W and GB/W for power efficiency, and there are more.
The other really important point, which has been seen with various benchmark results that have been posted here, is that it is important to be able to scale these results *down*. It's easy to recognise that if a system can generate 100,000 IOPS then 2 of them can generate 200,000 IOPS, but if you only want 20,000 IOPS then will your cost be 20% of the benchmarked result or 90%? It would be great to see something like SPC-1 extended to a range of sizes (for example 20TB, 100TB, 500TB) to see how costs scale as well as performance.
As a consultant it might be good to understand the company's product lineup. Barracuda are desktop drives, not server drives. Big difference in duty cycle and perhaps why you're seeing so many failures?
The right tool for the right job...
Tell Us the Money
These benchmarks need to be overhauled so that they show $/IOPS (or equivalent) and $/GB (end-user available, not raw) at a number of capacity points such as 1TB, 10TB and 100TB. And all costs should be over 5 years so that they include whatever insane year 4 and 5 maintenance prices these vendors think they can get away with. Only then will they be useful for end-users to compare one product against another in something approaching their own environments.
The term "data protection" has been used to apply to backup and archive (amongst other related technology) for a long time, nothing new here.
It shouldn't be surprising that the term has been used elsewhere, though, "data" is a rather generic term.
More Than That
What if you want to use your data in a clustered environment? What if you want it mirrored for resiliency? What if you want it replicated for disaster recovery? What sort of downtime is involved in increasing the capacity? The downsides of DAS were never about speed, they were about manageability. This is even more true in a virtualized world.
People who say Fusion-IO is storage are technically accurate but you won't find any sane business holding their primary copy of business-critical data on it. This stuff is basically another type of cache.
So will it supplement SANs? Yes. Will it supplant them? No.
Snarkiness from the Uninformed
This has nothing to do with garbage collection, with the reference count going from 1 to 0 for a given piece of data. It has to do with reduced referencing, with the reference going from n to n-1 for a given piece of data where n >1. As would be expected, if all you're doing is decrementing a reference then you're not oging to save a lot of space by doing so, and so 'traditional' methods of keeping your storage utilisation down aren't going to work with deduped storage.
Yes it's a relatively simple point but given that from the comments half the people here didn't even get the basic idea of the article there is some worth in publishing this stuff.
One other thing that has been missed is that because dedupe is (as a rule) carried out as a post-process you need to keep some 'spare' capacity anyway to place yet-to-be-deduped information prior to the dedupe process kicking off, so when you measure your high water mark make sure it's before your daily dedupe kicks in rather than when you wander in with your coffee and the process is long completed.
IOPS or IOPS/TB?
Regarding IOPS and different drive types, looking at disks individually doesn't give the best picture. For example if you take a 3.5" drive and a 2.5" drive, both with 600GB capacity but the 3.5" spinning at 15Krpm and the 2.5" at 10Krpm, you can get approximately 25% more IOPS from the 3.5" drive.
When you move to a proper storage array things change because you can pack so many more 2.5" drives in to the same space. This increases the absolute number of IOPS but not the IOPS/TB.
So when considering any (non-streaming) storage system you need to have three technical metrics in your head: TB, IOPS and IOPS/TB. Once you have an idea of what these numbers need to be you can start to make a decision between different hard drive form factors, capacities and rotational speeds. Other metrics such as cost, maintenance over useful lifetime (==cost), space taken up (== cost) and power required (==cost) probably also matter to you, and again depending on your requirement one drive type will be cheaper than the others.
You can also use this to decide if SSD is worth it for you as well.
- Vid Hubble 'scope scans 200,000-ton CHUNKY CRUMBLE ENIGMA
- Bugger the jetpack, where's my 21st-century Psion?
- Google offers up its own Googlers in cloud channel chumship trawl
- Interview Global Warming IS REAL, argues sceptic mathematician - it just isn't THERMAGEDDON
- Apple to grieving sons: NO, you cannot have access to your dead mum's iPad