33 posts • joined Thursday 6th January 2011 11:05 GMT
Re: Never going to work
"Internal market prices/wages should NOT change in response to the external exchange rate"
I used to have a salary denominated in USD and paid in GBP; changed every month and sometimes (through 2008-2010) a lot. Stop thinking of Bitcoin as something directly related to USD and think of it more as a separate currency.
There are no bankers of Bitcoin, that's the whole point of it being a decentralised system.
Trading in and out of Bitcoin with other currencies will attract a fee, as would buying gold or dollars with pounds, but in my opinion that's fair enough. Once you own bitcoins then transactions are easy and free.
Re: Hangouts for SMS
Handcent SMS works nicely for me.
Many, many years ago (late 90s) Solaris could only have a maximum of 5 partitions per disk (actually 7 but two were reserved). And Sybase could only a maximum of 2GB per partition. This resulted in a 9GB LUN being perfect: 4x2GB partitions for data and 1x1GB partition for binaries, logs and the like. The world has moved on, but many DBAs haven't.
Re: Long overdue
Moving from any MySQL-based database to PostgreSQL as a default would be good for everyone. PostgreSQL is both awesome in terms of features and an actual real database built on actual real database principles. The more people that use it for their bigger/more important projects the better.
For the smaller just-need-a-structured-datastore needs just use SQLite.
And the size?
A quick look at the SPC-1 report shows that the total available storage (prior to losing any through formatting it with a filesystem) is a smidge over 1TB. So: great for speed, not so hot for actually storing anything of any decent size.
Re: Actually, yes...
I think you might be looking at the wrong people to move this forward. My guess would be that you're more likely to see it being pushed by the software vendors. Microsoft and Oracle spring to mind, but for any software vendor who can build a VM which contains an image based on the storage vendor's specification and gains high availability/clustering/failover for pretty much free it's a very interesting proposition.
Quite surprised that no storage vendor has taken that first step, to be honest. And I doubt that it's the type of thing that a startup would want to take on, as they would need to work pretty hard to get the larger software vendors on board. So perhaps it will just not happen through plain lack of drive from the bigger storage vendors; wouldn't be the first time...
I think that servers-in-arrays does have a place, and that place is where there are relatively low requirements in terms of numbers of servers but relatively high requirements in terms of availability.
Take a simple case of a small business that wants to run a couple of servers. Say Exchange and SQLServer, for example. Any highly available implementation of this is going to involve a fair amount of complexity in both physical and logical configuration. But storage arrays have a lot of high availability functions built in to them and so handling the 'server' side of things (which would basically be a pre-configured VM running on a lightweight hypervisor inside the storage array) would be relatively painless.
Given the complexity (and cost) of building out some sort of SAN configuration with multiple servers and storage arrays I can see this being a viable option for smaller companies, certainly. As to if larger shops or service providers would be interested in them, I think it would come down very simply to if they proved to be cheaper to manage and more reliable than the build-your-own variety.
Valuation Vs Revenue
So what do their revenue numbers look like? A $2B market cap is pretty big and needs to be justified somehow. Violin is one of the earlier SSD companies, with not a lot to talk about technology-wise except for "it's flash". At least with companies such as SolidFire or Actifio there's a software layer there that adds value (in theory; not wanting to argue the specifics), but I don't see anything in Violin that you couldn't do with a JBOD enclosure and a linux stack with software RAID.
Of course, the last two flash-type acquisitions were massively overpriced as well so perhaps it's just me that's missing the point here.
Re: The simple, elegant solution...
If you have enough RAM to store all of your data then you don't even need flash at the backend, just use disk. Streaming all of your memory to hard drives is not significantly slower than flash, and is certainly possible to do with a few supercaps.
In my opinion the biggest change in storage in the near-term is the one that moves high-end storage from being centralised to localised again. When there is a software layer that can handle dispersing data intelligently around the available infrastructure, ensuring availability and protection as well as good enough performance, *that* is when the storage landscape changes. Right now there really isn't a lot of difference between your low-end HDDs and high-end SSDs beyond $/GB and $/IOPS.
Nothing to do with Modern Workloads
“Magnetic disks are rapidly starting to exhibit tape-like properties and with modern workloads being increasingly random, they are becoming less and less suitable as a storage system.”
It's not modern workloads, it's the size of hard disks. As a hard disk gets bigger the amount of data accessed over a given time period, as a % of the total disk, either drops to where the access patterns look like archive storage or stays the same and hits the IOPS limit of the hard disk.
In fact HDDs are a fantastic storage system, just not so good at random access high IOPS storage.
As to if modern workloads are any different from previous workloads in terms of data access patterns, it's more of a case of how well the database is designed by the developer than the type of database itself.
All in all a pretty poor way to introduce a very useful addition to the EC2 family.
Re: @Dazed and Confused
"...when an SSD dies, there is no recovery."
That's a stupid comment; of course when it dies there is no recovery. Exactly the same as HDDs. Except that SSDs already have built-in redundancy (basically a combination of internal RAID and over-allocation of capacity) and no moving parts so have fewer chances of both individual and systematic failure.
Are they perfect? No. But are they at the stage where they are suitable, both reliability- and cost-wise, for primary storage for the vast majority of uses? Absolutely.
Re: Idiot managers should cut prices and release huge disks now!
They were talking about storage arrays, not single drives. The issues in Thailand might have put them 6 or so months behind their original schedule but they're not holding back on a ~10x increase in capacity for a lark.
in the meantime my exchange was due to be upgraded to FTTC in June right up until yesterday, when it magically slipped to September. Some realistic scheduling would be great.
Location, Location, Location
One thing that hasn't been mentioned is that the location of the wireless access point/router is critical for achieving good coverage. The basic idea is that it should be in the middle of the house, which may not seem like anything other than the obvious but I've seen many instances where people have their wireless router tucked away in the corner of a room on or near the floor and then wonder why reception in their bedroom isn't all that.
If you have a wireless router sitting low down in a downstairs room then try putting it on a high-up shelf. Also, if you have three antennae on the router then place them so they look like \ | / (with the outside ones at something like 45 degrees) rather than them all facing up. Between them these two simple changes can make the difference between no/patchy signal upstairs and a nice strong wireless connection everywhere in the house.
@Rob If you do a google search for either "dado trunking" or "skirting trunking" you'll find lots of options out there.
If you want to run cable outside then you need to look for "external grade" cable. Won't be cheap, though. You may want to think about running CAT5e instead (unless you really need that 10G connection of course...)
Millions of Internet Users?
These millions of internet users, they'll be the ones who don't have an email address under the current TLDs but will immediately use the new TLDs when they become available?
And with the costs involved in picking up a new TLD I can't imagine that these .coke and whatever addresses will be used for anything more than microsites and similar web-based marketing attempts for massive companies that already have very well-established internet presences.
I'm not seeing this as being anywhere near the issue that people are talking about...
Wrong wrong wrong
Using SATA and SSD in a tiered solution, the so-called "flash and trash" argument, is a plain stupid thing to do. The reason for this is simple: the speed of your overall data access is limited by three things:
- speed of your SSD tier
- speed of your HDD tier
- % of active data on each
Let's take a simple example: if your HDD tier consists of 100 SATA drives and so can manage 20,000 IOPS (being generous) and 80% of your active data is on flash then your overall performance will be 100,000 IOPS. Doesn't matter that your SSD tier could push out a few million IOPS given the chance, it isn't given the chance because your overall system is bottlenecked on the SATA. Which is an incredible waste of both money (that SSD doesn't come cheap) and potential performance (100K IOPS from a solution with lots of SSD? How embarrassing).
To increase the overall throughput you need to up one or more of the above factors. Given that the SSD is not the bottleneck and the % of active data on the top tier is what it is (for a given algorithm and SSD/HDD ratio) your best bang for the buck will be to have a tier of enterprise-class HDDs that are big enough to hold most of the last 10-20% of active data that doesn't fit in to the SSDs and fast enough to allow the SSDs to really throw out the IOPS.
Fast HDDs will continue to play an important part in storage, tiered or otherwise, until anything not on SSD is considered nearline and treated as such; not just a separate tier but a separate access mechanism.
You expect the OS to be able to handle losing access to drivers, pipes, swathes of virtual memory (assuming that there is a pagefile out there) etc. and carry on working? To what purpose?
In this type of situation, with perfect error recovery, chances are the OS would be running but totally non-functional as everything it would try would result in an error. In reality everything would just crash.
You want to try it? Put any OS of your choice on a USB drive, boot it, do some work, then pull the USB drive. See how much more work you can get done before the whole system crashes and burns.
Apples to Apples
So if a 512 byte read needs to read 4K from the underlying storage can you explain exactly why comparing the number of 512 byte reads to the number of 4K reads is such a bad comparison? Sounds like they are very much apples to apples to me.
It is Cheaper...
...than IBM. But the IBM system is a classic example of a benchmarking product built for a big headline number but not something that anyone would purchase in the real world. Their costs for capacity and performance are both stupidly high.
I can see a place for the TMS unit where ultra-high performance over a relatively small amount of storage is required, but given it is still coming in at up to 5x the cost of capacity of other SPC-1 results it isn't exactly what you would call general-purpose storage. Saying that it is cheaper than IBM for both capacity and performance is no more of an accolade than "not in a position to stiff customers who have no idea what they are buying but know it needs to be blue".
Pity the Performance
It might seem like a nice idea to have everything as a virtualized stack but how many context switches is an I/O going to need as it passes through multiple virtual machines on the same CPU to get from "server" to "storage"? There's only so much CPU power you can use to compensate for this type of massively inefficient design, and in a world where SSD-level performance is expected it just isn't going to be able to keep up.
Flash-based Array Price/Performance
"The SPC-1 rankings are going to be dominated by flash systems, both in a performance sense and in a price/performance sense with outrageously impressive advances in price/performance by flash systems"
I'm unclear on where these outrageously impressive advances in price/performance will come from with flash systems. Two reasons for this: first, flash as a technology doesn't have too much further to go. There are other technologies coming down the line but don't hold your breath for flash to get significantly bigger or cheaper any time soon, certainly in a way that supercedes that of disk.
Second, many arrays are unable to handle the sheer number of I/Os that SSDs can generate, so there will be diminishing returns with these products. Scale-out products will help the performance aspect but unless there is a move away from the "big controller and lots of disk shelves" model the costs will be prohibitive thanks to the "big controller and a few SSDs" model that will result.
"Switching to 2.5-inch drives just delays the onset of the problem. A 3TB 2.5-inch drive will have the same interminable RAID-rebuild times as a 3.5-inch one."
Well kind-of. Right now the biggest 2.5" HDD is 900GB and the biggest 3.5" HDD is 3TB. If you assume that the 2.5" HDDs can get to 1.5TB in the near future then that's probably the transition point as you can put 2x2.5" HDDs in place of the 3.5" HDD for the same capacity but higher rotational speed (10Krpm Vs 7.2Krpm) and significantly faster rebuild time (1.5TB@10Krpm Vs. 3TB@7.2Krpm). And rebuild times aren't helped by hybrid drives, so no luck there.
The I/O density problem is massive, and is helped by hybrid technology, but only if done right. Chances are that this will have to happen in the array rather than the drive; a single drive is (ironically given the article) not large enough to provide a set amount of cache and expect that cache to be used effectively.
So SSDs for the desktop, hybrid storage arrays for businesses.
No idea what you mean by "meanwhile withdrawn" for the Xiotech result, I don't see anything to note that on the SPC website. But anyway...
Your point was the $/IOPS, not total IOPS, but as you're talking absolutes let's go there. You can imagine putting 12x of the Xiotech boxes next to each other to obtain the same total IOPS at the same $/IOPS number (and hence a better overall $ number), but what about the other way?
One of the points I made was that it is good for systems to be able to scale down. A common refrain in the storage industry (not picking on you personally here) is that the bigger arrays are more efficient, are cheaper in terms of $/IOPS and $/GB, etc. This is why people were convinced to lay down half a million per array in the first place. If modular arrays are not only cheaper to purchase due to smaller sizes but have better relative metrics then about the only thing left for larger arrays is that they are easier to manage.
That in itself is arguable with modern technologies to manage multiple arrays as a single entity, but even giving that to larger arrays I'm not sure that is going to be enough to convince people that they need to continue purchasing larger arrays when smaller ones start to make more sense on every metric.
Xiotech for one has posted SPC-1 $/IOPS that are significantly lower than those posted by the ETERNUS DX440 so no, I'm afraid that Fujitsu is not the leader.
And although it seems unfair to critcise a performance-centric benchmark for being one-dimensional there are many other factors that should be taken in to account in addition to $/IOPS. $/GB is an obvious one, and actually SPC-1 is a great place to obtain real-world comparisons as they break down where the storage is 'wasted' (metadata, RAID, sparing, etc.). Others include IOPS/U and GB/U for density, IOPS/W and GB/W for power efficiency, and there are more.
The other really important point, which has been seen with various benchmark results that have been posted here, is that it is important to be able to scale these results *down*. It's easy to recognise that if a system can generate 100,000 IOPS then 2 of them can generate 200,000 IOPS, but if you only want 20,000 IOPS then will your cost be 20% of the benchmarked result or 90%? It would be great to see something like SPC-1 extended to a range of sizes (for example 20TB, 100TB, 500TB) to see how costs scale as well as performance.
As a consultant it might be good to understand the company's product lineup. Barracuda are desktop drives, not server drives. Big difference in duty cycle and perhaps why you're seeing so many failures?
The right tool for the right job...
Tell Us the Money
These benchmarks need to be overhauled so that they show $/IOPS (or equivalent) and $/GB (end-user available, not raw) at a number of capacity points such as 1TB, 10TB and 100TB. And all costs should be over 5 years so that they include whatever insane year 4 and 5 maintenance prices these vendors think they can get away with. Only then will they be useful for end-users to compare one product against another in something approaching their own environments.
The term "data protection" has been used to apply to backup and archive (amongst other related technology) for a long time, nothing new here.
It shouldn't be surprising that the term has been used elsewhere, though, "data" is a rather generic term.
More Than That
What if you want to use your data in a clustered environment? What if you want it mirrored for resiliency? What if you want it replicated for disaster recovery? What sort of downtime is involved in increasing the capacity? The downsides of DAS were never about speed, they were about manageability. This is even more true in a virtualized world.
People who say Fusion-IO is storage are technically accurate but you won't find any sane business holding their primary copy of business-critical data on it. This stuff is basically another type of cache.
So will it supplement SANs? Yes. Will it supplant them? No.
Snarkiness from the Uninformed
This has nothing to do with garbage collection, with the reference count going from 1 to 0 for a given piece of data. It has to do with reduced referencing, with the reference going from n to n-1 for a given piece of data where n >1. As would be expected, if all you're doing is decrementing a reference then you're not oging to save a lot of space by doing so, and so 'traditional' methods of keeping your storage utilisation down aren't going to work with deduped storage.
Yes it's a relatively simple point but given that from the comments half the people here didn't even get the basic idea of the article there is some worth in publishing this stuff.
One other thing that has been missed is that because dedupe is (as a rule) carried out as a post-process you need to keep some 'spare' capacity anyway to place yet-to-be-deduped information prior to the dedupe process kicking off, so when you measure your high water mark make sure it's before your daily dedupe kicks in rather than when you wander in with your coffee and the process is long completed.
IOPS or IOPS/TB?
Regarding IOPS and different drive types, looking at disks individually doesn't give the best picture. For example if you take a 3.5" drive and a 2.5" drive, both with 600GB capacity but the 3.5" spinning at 15Krpm and the 2.5" at 10Krpm, you can get approximately 25% more IOPS from the 3.5" drive.
When you move to a proper storage array things change because you can pack so many more 2.5" drives in to the same space. This increases the absolute number of IOPS but not the IOPS/TB.
So when considering any (non-streaming) storage system you need to have three technical metrics in your head: TB, IOPS and IOPS/TB. Once you have an idea of what these numbers need to be you can start to make a decision between different hard drive form factors, capacities and rotational speeds. Other metrics such as cost, maintenance over useful lifetime (==cost), space taken up (== cost) and power required (==cost) probably also matter to you, and again depending on your requirement one drive type will be cheaper than the others.
You can also use this to decide if SSD is worth it for you as well.