25 posts • joined 12 Sep 2008
Software needs Hardware - but good software can run on any hardware
(Disclaimer - I work for IBM in the SVC and Storwize development team)
Quite a few comments here, the first I would contend with is the statement above that all IBM can do is buy in technology like TMS and Storwize. Well, if you actually knew what you were talking about, you would see that the 1% of IBM's revenue last year that came from SVC and Storwize was 100% organic IBM development from the last 14 years of work from the team I work in.
Yes, we re-used the Storwize name, but the product range is based off of SVC software.
SVC is a software product and always has been - its part of IBM's Tivoli group, the Storage Software Group - but all software needs some hardware to run on.
We've looked many times at running SVC software on anything, but the main reasons we don't do this are performance and reliability. You need to guarantee certain performance levels, certain response time aspects, and ensure you can report when something has gone wrong. For now, thats why SVC software runs on specific SystemX hardware, and the Storwize controllers.
At the end of the day, just like Printers, Disk Drives, Laptops, POS, and now x86 hardware, these all become commodity items, that a vendor who is geared up to produce millions of them, and has routes to marked for the masses are in a much better position than IBM to increase sale and make profit.
Storage is still a growth area, and IBM has some great technology in SVC and XIV that have proved they can scale and grow with customers right from small 12 disk systems to many tens or hundreds of petabytes.
All Storwize product hardware is based on x86_64 hardware, but its custom built planars to fit in the form factors needed, and had no SystemX components. XIV is a custom "storage rich server" that again has no SystemX components. Only SVC used a vanilla (with a little metal-bending) SystemX platform, and more often than not, some of the server components just got in our way - like IMM's service processors and the like. With the sale of SystemX, yes we could goto Lenovo in the future and get a standard server from them, or anyone else, or we could tender for a specific server planar that has just the bits we want, and none of the bits that get in the way, or cause additional development to work around.
SVC software has run on over 15 platforms in the last 10 years - including MIPs - so the actual base hardware is almost irrelevant to us.
All that said, the roadmap for SVC and Storwize is rosey, growth is meeting target and some of the exciting things we are working on at the moment will take the bar to the next level.
Apples and Oranges
Disclaimer: I work for IBM and was involved with the V7000 all SSD publish.
1. V7000 like 3Par is clusterable.
V7000 publish used a single system, but can linearly scale to 4 control enclosures
So 4x the 120K published number.
3Par used essentially equivalent to 2 control enclosures, hence double the performance.
2. V7000 = 18 drives
3Par - 32 drives
3. SPC-1 tests benefit from cache to usable capacity of storage ratio. Hence with the 1TB usable and 64GB cache in 3Par - much higher cache hit chance than V7000 with 3TB usable and 16GB cache...
If done apples to apples, I thnk you'd find they are both limited by the SSD performance and not the controllers.
As with all things performance - devil is in the detail.!
One more for the Sun fan-boy masses...
Oracles Z stuff used :
8x 512GB SSD
8x 73GB SSD
and 280x 15K RPM disks
Total SSD capacity = 4680GB
Total HDD IOPS = 280x 300 = 84,000 IOPs just from the HDD
So the SSD only contributed : 137,000 - 84,000 = 53,000 IOPs
Which based on 4860GB is 10.9 IOPs/GB
V7000 used 18x200GB = 3600GB
So is 33.33 IOPs/GB
Maths speaks for itself
MISSING THE POINT ME THINKS!
So i think everyone is missing the point of what this publish means.
1. V7000 as a 2U system can hold its own against flash only boxes
2. V7000 as a 2U midrange box with Easy Tier has a whole bunch more performance to give over and above any disks it contains...
See my post here, and feel free to comment :
sustained iops after 6-9 months
what do they sustain after they have been around the block a few times... consumer grade flash devices will fail and will degrade... there is a reason enterprise "class" ssd cost so much
Interesting... V7000 released 4Q20120, and saw a MASSIVE increase in sales in 4Q2011... with V7000 Unified... maybe its all making sens (on the IBM numbers anyway)
To IO-IO Hmm... I was trying to work out which vendor you work for. Initial thoughts were an EMC or NetApp employee, but looking through your recent replys on el Reg, you've not only slagged off IBM but EMC, NetApp and also HP...
The only thing you have commented positively, and with some technical know how about is 3Par - which leads me to believe the large chip on your shoulder is something to do with the recent aquisition...
Now clearly you don't understand SVC, nor the advantage to moving virtualization OUTSIDE the array - and virtualizaing heterogenous environments with a "no distruption when that array has to be replaced" approach.
You make comments as to "why IBM developed SVC" when you have no factual knowledge of why we did it - also that SVC is designed as enterprise by a team who have worked on enterprise storage for over 20 years, and has reliability levels that rival the mainframe itself.
You also claim IBM has no storage portfolio, sighting un-competitive products etc - where have you been sleeping the last 24 months... XIV and V7000? The most innovative gender disruptive storage products for some time...
Suggest you work out why that chip on your shoulder has grown so large and work on reducing it rather than swiping at everyone on here.
Chris, SVC and V7000 can use any extent size from 16MB through 8GB (powers of 2 in between)
devil in the detail
Chris, as always the devil is in the detail.
Pillar = 292x 15K RPM drives
V7000 = 240x 10K RPM drives
The difference in response curves will be down the the relative rotational latency (and throughput) of the drives.
There is OEM and there is OEM
Coward, you fail to understand what OEM means.
In OEM terms, you take a box, and stick your own badge on it.
Taking something that you design, get a 3rd party to manufacture, then run your own software on it and control your own destinty is not OEM in the way you suggest the LSI and NetApp relationships are/was.
Be sure to understand what it is you say before you leap to a misguided offense.
OEM no more
Where have you been - IBM is back in organic storage - The Storwize V7000 is not OEM'd - its IBM - maybe the DS 4/5 of the past true. but no more...
Good to catch up yesterday Chris, and nice article - my quote re virtualizing external storage was "we are happy for people to run external virtualized storage with production workloads, not just archive data like some other vendors"
At present there is no dedupe built into the V7000.
The GUI is "XIV style" but has been completely re-written to run browser side, rather than a java application.
I don't think we clarified yesterday, but the SATA drives, area actually Nearline SAS - so a SATA spindle but dual ported SAS interface.
On page 3 you mention XIV 3 times, when I think you mean SVC.
snoop snoop dog
With the AMD snooping overhead - like over 50% performance when 4-way - a dual hex core... thats one hell of an overhead on snoops
what about Z
If HP bought up 3PAR and hence probably butcher their relationship with HDS, then they'd lose Z attach - therefore I don't see it happening.
Not strange at all.
The FusionIO card is a great card for high speed direct attached storage. If you have an application that needs non-shared storage, then we offer the ioDrive on our System X hardware.
If you want SAN based SSD storage, then we are using STEC due to the form factor as I mentioned.
Hey for that matter, we support TMS RAMSAN behind SVC if you want one of them (but check out the performance compared with SVC+SSD internally)
Its better to have all the bases covered. We can replace HDD with SSD in our DS range of products, we can provide you an SSD optimised controller like SVC or we can supply you some local high MB/s SSD in your servers... covering most of the bases wouldn't you say?
PS. Just to clarify, the 800,000 read iops is from the SSD's alone. (I've actually measured closer to 900,000) but this is not an SVC limit, SVC itself will happily sustain over 1,500,000 read MISS iops (4KB) so the SSD alone is only a fraction of what the new CF8 nodes can handle.
Software vs ASIC
Chris, re your assertions about "software solutions" and "most mature thin provisioning" - I'd question the 'advantage' of having to develop an ASIC to perform a function that is basically performing memory compares. The Intel SSE hardware that does direct memory compare functions at 64bit word size with almost no impact to other standard CPU processing is a much more sustainable and cost effective way of doing zero detection. It took us 20 lines of assembly what 3Par need an ASIC macro for. Given the speed of Nehelam Xeons, with 20+ GB/s memory bandwidth per die... Similarily, since 3Par are, as I've said before the "grand-daddy" of thin provisioning, what have they been doing for the last 5 years... SVC got "Space-efficient" this time last year, and now we can migrate thick to thin, inline strip out zeros... and lots more on the horizon... I really don't think ASIC based, offload hardware is the future of our industry, commodity CPU with offload software is much faster to develop and generally agile
Re: Is it me...
Doesn't work like that - you still allocate the meta-data that says a block is allocated, but you don't use up the capacity for that block. Therefore the thin provisioning software knows that the "virtual" volume is actually 10GB allocated, but physically it is only using <1GB of real capacity. So when the user tried to write 5GB it would fail as the 'disk is full' at the 'virtual' level...
Re: Not just Intel... look at STEC BS
Disclaimer : I work for IBM, not STEC.
Before casting aspersions, maybe the "anon coward" here should do some homework. The limit in the SPC-1 pSeries SPC is not the STEC drive, but the SAS HBA and ESM units used to connect to the SSD. If you benchmark an STEC drive in the right enclosure / fabric you can SUSTAIN around 45,000 IOPs (read) and 17,000 IOPs (write) from a device when doing 4KB blocks.
SPC-1 uses between 8KB and 16KB blocks, and as we all know, SSD IOPs half as you double the block size, therefore at say 12KB this would bring the number down to around 6,000 IOPs for SPD-1
Go do the maths before you start mud-slinging.
Not sure why everyone gets so worked up about MB/s - this is only 2x or maybe 2.5x a fast HDD... what are the IOPs numbers, then we can see if these really make a difference.
apples and oranges
While this is obviously a read only measurement, remember that Quicksilver press for 1M was a *sustained for hours* 70/30 mixed workload. Pure reads were doing over 4M iops, so its not that much of a leap.
Granted, Quicksilver was a technology demo - but thats where products start...
RE: Even less endurance?
A couple of generations back, SLC was at over 1,000,000 raw writes per cell. Now we are at 100,000 - so yes there is a direct correspondance with the number of electrons per bit and how stable these are over time.
Maybe its not down to 1,000 but it will be lower than 10,000. With MLC drives today, performance is a lot lower than SLC - thus the number of writes you can do in 5 years reduces - however, the only reason that 'effective' lifetime can be 5 years is because of the over provisioning of the raw capacities - hence my comment re the need to over provision more. Look at some of the latest MLC devices - these have 50% effective capacity - vs much higher effective capacity on SLC devices.
I think we are in agreement however, SLC will become the standard requirement as dies shrink for the SSDs used to replace HDD, but MLC will continue to be driven by the consumer (mobile) requirements.
Even less endurance?
The problem with another die shrink on the underlying NAND is that this further reduces "writes per cell" - with every generation an order of magnitude is lost.
While this is great for the mobile industry, much larger capacities in less space - its fine for iPods and phones where we tend to "write once" and read many - how many times do you delete an mp3 or a photo and then overwrite the blocks ? never?
In an disk drive however we are forever re-writing blocks. If endurance drops to <10,000 write per cell for these 32nm NAND, then you have to over-provision more and more - thus defeating the purpose of adding the extra capacity. Todays 45nm SLC NAND is the only real option for an enterprise drive - I wouldn't even like to put an MLC device in my desktop or laptop until we have hybrid SLC and MLC - so the disk itself can move stagnant data onto the MLC portion.
Apples and Oranges
So how many 8K IO/s does it do?
What I've found with all flash devices is that the IOPs halve as you double the transfer size.
200K at 2K, says 50K as 8K etc...
This is a fairly fundamental aspect of all flash devices. Which is why they only can do about 5 or 6x a traditional HDD in MB/s.
The ultimate question is mixed performance however, everyone can do amazing read only performance, but what happens when the workload is a much more typical 70/30 or 50/50 mix of reads and writes... that separates the men from the boys...
As for the question of RAID'ing Fusion, sure they appear to a system as a block device, so you can run software RAID in a core on the OS and get as much performance as your core can give you...
Devil is in the detail
While I knew this box was coming from Woody and co at TMS, and in principle the concept is similar at a high level to what we demo'd as Quicksilver, in this case the 1M comparison is not apples for apples.
Our 1M IOPs was run with a mixed 70/30 ratio of reads and writes at 4K blocks. We all know that writes are the problem child for flash, and in some cases mixed workloads are even worse.
The TMS spec sheets don't give anything other than 100% read numbers, and doesn;t qualify at what transfer size.
If we take this at face value, and assume its small blocks 512byte or 1K, then the real number to compare this against Quicksilver would be 4.7Million IOPs. That is, if we had quote pure read numbers for Quicksilver. That would however have needed more SVC nodes, and several more fully populated Power Systems 595 hosts...
I'd like to see the 'details' - as I'm interested what the RamSan can do with a more realistic real-life workload.
CX4 considered tier 0/1 - yeah in your dreams. A windows box running your top tier storage... hehe
Seriously though, not sure how many CX customers will be paying the $30K per SSD to put in a mid-range box. Maybe its an admission that a big box like DMX isn't needed anymore?
Anyway, Chris's tone here is 'very register' and I counter some of his points over here :