30 posts • joined Friday 5th August 2011 07:45 GMT
Yep. Consistency is indeed done via time which requires extreme accuracy. Atomic clocks are involved but it does depend on what you are doing.
The other thing is that it's semi-hierarchical rather than truly relational, that helps because they can group data thus helping synch issues on updates.
Re: DIY SAN
Absolutely spot on; if you look at what the main cloud providers - Microsoft and Amazon do in terms of their storage - that is exactly the approach they have taken.
Microsoft Azure is not SAN based, nor is AWS.
Hadoop is successful not just because it does unstructured data via MR but you can chop the data up, distribute it and have the processing work where the data is (locally) which is the exact opposite of the SAN approach where you move the data off the SAN to the server.
It's going to take another few years but the need SAN's met can be more easily and cost effectively met with software and commodity kit.
I guess once we are all on the cloud then the SAN V DAS argument will be mute anyway - because the major cloud vendors ain't SAN based! :)
Yes - SAN has had it's place and certainly within the database area has not always been successful because of the black box approach that frustrates the hell out of folk trying to manage performance of said database - the DBA's. You don't see a SAN in the cloud thankfully, just commodity kit and DAS - at last!
Software has evolved to form reliable and performance rich distributed data processing aka Hadoop, Cassandra, Volt-DB to mention just 3.
Hardware has evolved - it's great to see even PCI based SSD being threatened with the true end goal of persistent memory - we are now able to put TB's of SSD directly into memory sockets.
It's going to be an interesting few years to come; and thankfully and hopefully that will be without SAN's!
You just couldn't make it up!
Its not funny but it is; or is it just bemusement?
Hopefully they'll be in the enterprise space.
I know they've comodity SSD SATA 3 drives (just ordered one this morning); but we need them in the enterprise space to drive those ridiculously high SSD drive prices down. Hopefully they'll pick up where OCZ hasn't really succeeded - the PCIe flash enterprise space.
And people thought Microsoft were bad!
They need to pull their finger out and innovate rather than litigate.
Another rip off
1.9m - what on earth are they designing - another ebay?
Sickens me, there aren't even that many TCO's and what complexity is there!
A small software house could knock that up at a fraction of the cost.
Its not space is IOPS (Databases anyway!)
In the database space IOPs is key; to get IOPs I physically need many many more hard drives than I need SSD for decent semi-random IO at a decent latency (< 5ms).
So, for my 1TB database I only need to buy a 1TB PCIe card, to get the same level of IOPS from hard disks I'd need dozens.
So, comparing storage shipped between SSD and Hard drives is just wrong, it should be a mix.
Re: What was the point of this article??
Basically they replaced many many many disks required for IOps with significantly fewer!
I'm a SQL Server guy, it has a significantly different access profile that a file server.
Databases do not work well on file server ideology!
Re: Not missing the point
Not sure you are agreeing with me, but my point is this - why go to the trouble of a SAN when you can easily and more cheaply achieve the goal with PCIe connected flash?
You don't need to worry about switches and multiple HBA, controllers to get your throughput up.
You don't need to worry about latency nor IOP's
Redundancy is easy.
Remember as I said, I write within the context of the Database space.We need raw throughput because in BI we may be processing over hundreds of GiB's of data in a single query - that data is spread across the storage, something where disk geometry starts to play an effect on latency per IO.
I don't believe the answer is SSD (SAS or FC connected) but its Flash based PCIe; however, I don't believe that will easily become a reality until commoditised distributed database platforms take hold - something that we are already seeing with HADOOP to help deal with "Big Data".
SSD is cheaper per GB when...
As a database professional I need to quantify two things - how much data I need to store and how many IOps the IO subsystem needs to be capable of at a realistic latency of say < 5ms per IO.
As a raw comparison, say I want a 900GIB database, I want 10K IOps < 5ms IO; I can get a single PCIe card to do that with flash (easily and for less than £5K), if I want redundancy I buy another (£10K in total).
What about hard drives - 2.5" 15K, the max is 300GiB per drive, realistically around 300IOps per drive - how many drives do I need to buy in order to get the IOps and redundancy I need? Significant, so many in fact that I'd need an external array for a start - more cost, more controllers etc.
In the real world this rubbish about SSD's are more expensive per GB is just wrong, we require IOps in the real world the two go together, remember I need 900GiB of storage space, but I'd probably need 30 drives for 10K IOps with RAID 1+0 at latency of < 5ms per IO - that is about £9K just for the disks themselves without the two additional storage arrays to hold them, the dual controllers required and then the ongoing power and cooling....
Missing the point a lot of the time
A lot of the time people just completely miss the point about PCIe connected flash - a single card can easily operate at 1.9GiBytes/second (ref: ocz revo3 maxiops that I've got in this machine I'm writing on), per channel SAS 600 can only cope with 500MiB/sec but the "interface" is dramatically less so if you want to achieve 1.9GiBytes per second from a SAN - how do you go about it? With complexity, cost and the hope that the latency will be realistic.
[written with the context of database storage]
Anyway I wonder if this WD guy was talking about growth in the past 6 months which I'd expect because their factories were flooded so it only stands to reason that now they are manufacturing again there is a short fall to fill :)
Storage - always behind?
Just like their 520, way behind the competition.
Reminds me of the processor war in the commodity space when AMD slam dunked Intel on performance until Intel caught up with their i7 stuff, now the opposite is true.
You are partly right; but in terms of infrastrucure the principle is that the servers in the cluster are fully redundant, so shared nothing, hence dasd. Connecting the machines in the cluster to a SAN which is basically what this article is saying goes completely against scale out.
Commodity storage in the cluster nodes can be achieved with commodity SSD, SATA or PCIe based to get the local IOps and still be cost effectively disposable.
There is definitely an upsell into this space but I'd expect that because lets face we have a lot of folk with vestied interests in keeping SAN technology.
I think a law should be put in place so that emergency service "call centres" should remain on the mainland and staffed by english speaking individuals you can actually understand, thereby not following the model by a lot of banks, ISP's, utility companies etc.
My two cents.
Interesting they are using what effectively is now an old release, we've since had 2008, 2008 R2 and now 2012.
I'd be interested to know the reasoning.
All technologies hit a wall - look at 15Krpm disks
15Krpm disks have been around for over a decade and we are still only on capacities of 300GB on a 2.5" drive and 900GB on a 3.5" drive; you need at least a dozen 15Krpm drives to compete on a 50/50 random read/write work load with a typical SSD; even then the disks just can't get the data off quick enough.
Phase Change Memory is my bet.
Good - another move away from SAN
Certainly within the database space IOPS at low latency are more and more important, and for typical SAN based disk archiecture it costs more and more because of the inherently poor latency performance of a random workload on a 15Krpm drive.
Flash hooked into PCIe is the way to go, resilience through software block replication to remote servers.
Role on a nother 5 years!
@ what latency though?
It's great writing at those speeds and I see lots of applications for that, in my realm - transaction log of a database for instance; web logs etc..
However, what about a semi or full random workload, will there be latency like we have already on spinning media, has anybody read the paper yet http://www.nature.com/ncomms/journal/v3/n2/full/ncomms1666.html?
It will be great if they can make the technology so we don't have spinning platters and head movements, basically make like flash :)
Another no clue SE?
Ok, you got me - 19 years of MS SQL Server experience and 13 MVP awards aside, I've tried googling the spec of the INTEL SSD you've specified and it doesn't appear to exist even on INTEL site itself.
A quick check on the P410i specs and you'll see that SAS connected drives operate up to 6Gbits/sec (per channel or drive in english), however, SATA connected drives, which is probably what you are testing with (please show me the spec otherwise) are connected at 3Gbit/sec which is significantly short (max 264MiB/sec) of what a standard SSD is capable of - the majority are now SATA 3 (6 Gbits/sec) hence the capable 530MiBytes/sec throughput (per drive).
I hope you enjoyed my paper!
Anyway, if you would like to show us all just how almighty and experienced you are then why not blog the perfmon results and a proper spec including the correct model numbers of the drives you are using!
SQL Server? Your times don't sound right
@Philip Lewis: I've done a ton of testing on the OCZ Agility and IBIS drives and your numbers just don't stack up. Some figures can be found on my paper at http://www.reportingbrick.com.
I'd be interesting in you posting the hardware specifications, how you are attaching the SAS disks and how the SSD - note, the SSD's will likely be SATA connected and on say a HP controller will be limited to 1.5Gbits/second becaus of the SATA revision they use so the overall transfer speed will be significantly less that the SSD can achieve.
Back to the article, those figures are way behind what OCZ have had out for a couple of years now.
HP's P410i controller takes SATA and SAS together
I did an interesting experiment on a DL360, the disk slots are SATA (regardless of protocol), anyway I've a 4 disk 10K 300GB each in RAID 0 of SAS; the spare slot I put an OCZ agility 3 60GB drive in a caddy and slotted it in the server - create it as a logical disk etc.. Ok, its only SATA 1 so 1.5Gbits/second but I still put the RAID array to shame on physical IOps.
What I'm saying is that we have the facility of using and getting the perf from SSD's now using existing protocols, random IOPS with a sub-millisecond latency is what I'm after as a database specialist.
Is it a conspiracy that the controller is limited to 1.5Gbits for SATA drives? If it was SATA 2 I'd have 300MBytes per second per drive with IOPS sub-millisecond, SATA 3 600MBytes per second.... all on one drive! Just negates the need for so many 15K disks so the vendors start losing money; ever wondered why the price of enterprise SSD drives to go in your kit are like 5x + the price of commodity ones that actually out perform!
SATA already has hot plug written into the protcol
It's a great article by the way - very informative.
It bemuses me why we aren't moving away from SAS and to SATA 3; a lot of issues with SAS have already been addressed.
I though SAN's were a dieing tech?
Certainly within the database space (Microsoft SQL Server) I'm seeing more and more businesses using PCI based Nand cards like FusionIO and OCZ VeloDrive; cost per IO is substantially smaller than the SAN delivered equivalent (IO's per second at a latency realistic for database use i.e. <= 3ms per IO).
Sounds like the storage vendors are fighting back and simply turning their smart "storage" arrays into "a server with dasd" :)
Performance gone - knacke'd in less than a fortnight
Had a sudden drop from 1ms per io to 11ms per io (60mb/sec to 5.5mb/sec), another 24 hours later and its 15ms per io (4mb/sec).
Going to keep it running, it lasted longer than I thought, but it proves that in a database setting with lots of write activity then wear rate is an issue, anyway, I'll blog next week sometime once I've all the data collated.
Using RAID 1 isn't going to double the life, RAID 1 is mirroring so both drives will wear (sorry ware :)) out equally, perhaps you were thinking RAID 0 but with that there is no redundancy.
I've a test running now I've stopped the drive timeouts I was getting, I'll leave it for a few weeks and see what the results are. Am capturing all the logical disk through perfmon so should see any degradation.
Also - it was two entirely different systems I was looking at, neither enterprise level writes which was my point!
I've kicked off a test using IOMeter for a 54GB file, 8KB 100% sequential write with 8 outstanding IO's on a OCZ Agility 3 60GB, its on Windows 2008 R2 and I'm logging each minute the logical disk so I can see any degradtation over time. I'll leave it running and come back next weekend and let you know the results.
Yer yer, my spelling is crap, but I more than make up technically for that in SQL Server ;)
Does anybody have any figures in this area, I'm actually doing a project on using commodity SSD's for my masters BI project.
It would be good if that is the case.
Just checked one of my client's servers, a call centre, just a medium sized business and one of there databases has done 4TB of writes since middle of April (5 months). Another 14.5TB's since middle of May (3 months), those aren't enterprise installations either. Another 17TB's since end of June....
SO, my point is that in a database environment ware is a consideration, the biggest worry I have is there is absolutely no tests publicised in this area to see how long the live.
raid0 on card is fine, also, ibis are fine
whats wrong with the 4 or 8 way raid0? If you doing this properly you'd buy two cards and use software mirroring.
I've also done a lot of benchmarking on 2 x 240gb ibis cards in that configuration and they work brilliantly, on my simple database workload I can very easily overload the cores so they can't keep up (see my blog on it and the other tests I've done so far on them: http://tinyurl.com/3d8ygeu).
I think the majority of ibis problems come because of motherboard imcompatibility problems caused by the lack of bios memory available.