59 posts • joined 1 Dec 2009
Good quality stuff here. Should be mandatory reading for every CEO.
Good job on delivering guys. And good luck when it comes to flogging the stuff to the punters ;)
No, they're not used for single byte IO. But take a DB app that's updating an account balance record; regardless of how large that record might be, that's just a few bytes of update. Add in RAID (or EC) and you're reading & writing a lot of disk blocks (4K on modern drives) just for that small update. SSD read/write units are much larger; the problem's exacerbated.
The crapiness of NAND is about the same as crapiness for spinning rust. Error correction is our friend here.
How Controllers Maximize SSD Life
"That way there would be no need for extra hardware and software to cache random IO data and make it sequential before sending it to the parallel file system."
That's not the problem; it's reads that are the issue. With a parallel file system of N nodes and M drives, there are N*M places (roughly) where you can simultaneously write your data. That's hugely parallel in terms of bandwidth, and in fact this kind of system may have a lot more write bandwidth and be a lot better at writes than a narrower 1*M based flash system. In terms of write latency, the average of a disk based system is not that much slower than flash.
Caching writes and making it sequential (whatever that means) is just overhead with no return, and it's not normally done. Don't bother polluting the cache (we want that for reads), just get rid of the data as fast as possible. Only cache if your workload will regularly read shortly after write -- in which case, what are you doing using a highly parallelized file system as the back end?
Reads are different, since we now have to read the data from 1 disk or SSD (normally we only write to the one place), and the SSD has the advantage of no seek overhead or rotational read delay. If your workload is heavily biased to reading once data is written, a flash array may perform as well as a parallel filesystem. May, since we're now getting data off a shared resource (it's an array after all) with great latency but poorer potential bandwidth.
Plus the article kind of assumes a single writer/reader. With lots of servers doing writing and lots of servers then reading the written data (where caching is not going to help at all), parallelism of IO is incredibly important.
As is usual, understanding and properly sizing workloads is key.
(Disclosure; NetApp person)
Here, have a mug on me http://www.boredpanda.com/cunt-mug-university-of-north-texas/
I feel like I'm back in the 1950s, with valves and relays. Nostalgia.
Unless these things can seriously reduce latency in the IO path, they're boat anchors. Getting n times the bandwidth is easy; have n servers. Getting n/10 latency is hard.
To quote a colleague, if you've managed to get a file handle to an object, you're doing objects all wrong. Object protocols shouldn't support POSIX-like semantics. Files are for when you want to be stateful; objects are RESTful.
Anyhow, back to the question you were posed. There's a hashtag on Twitter for this; #QTWTAIN. It's short for "Questions To Which The Answer Is No".
Agreed. Richly deserved, good guy and excellent all round technologist
"Nice to meet you, EMC. I'm Michael Dell."
Since when does Gartner get the last word?
That top chart runs from 2012 to 2014 but right to left. My brain melted.
Full disclosure; NetApp employee, vendor of high quality & properly engineered UFAs (Unified Flash Arrays). And the company that first provided its customers with flash as a cache in front of boring old spinning rust, with properly engineered protection (in September 2009).
NetApp was awarded United States Patent 7,640,484 on December 29, 2009 for triple parity raid.
Here you'll find [the math behind triple parity RAID] explained in some detail.
There are a number of other assertions in this article that I disagree with. Unfortunately, they're largely technical in nature and require a lot more space (and maths) than this comment box. Basically, we'd argue that multiplying two very small numbers together to generate a vanishingly small number isn't the whole story (in fact, if you'll pardon the pun, it's only a small fraction of it). This [NetApp Technical Report from 2007] might give a flavour. Yes, it dates from 2007, but it's still relevant. Maths is like that; pretty timeless.
If you want to work it out for yourself, then this simple equation
Average access time = Hit time + Miss rate × Miss time
will help. It's fairly obvious that reducing the average access time can be achieved by any or all of
(a) reducing the miss rate by employing a bigger cache
(b) reducing the miss time by employing a faster cache
(c) reducing the hit time by making memory faster
Some solutions will change the variability of the access time; for example, a small fast cache (closer in speed to memory) reduces the variability, whereas a bigger but slow cache doesn't reduce it at all.
Working set sizes affect the the miss rate. Write-back or write-thru policies to cope with assymetric write/read times also play a part -- flash is faster to read than to write.
And so on. I agree with the overall summary; get expert help.
I can't think of any other excuse.
I sold my beloved 7 earlier this year. For 7.5K. 1993 1700 Ford xflow with a Marina rear axle. Huge 4 into 1 racing exhaust, noise like cannon fire on the overrun.
Jeez. I must have been mad getting rid of the old girl.
It took them 10 years to work this out? Here's the blackboard version from 2004 http://i3.kym-cdn.com/photos/images/original/000/325/699/4fc.jpg
At an estimated 200W 24 by 7 it might be worth investing in the optional RAIH5 power generating system (Redundant Array of Inexpensive Hamsters).
I'm on holiday so this is going to be a bit abbreviated. I've a suntan to work on.
The comments about NFSv4 are well wide of the mark. But that's to be expected, since there are really only two file protocols left; CIFS (sorry SMB )and NFS. Competition is good. And even MS has developed an NFS server.
More on topic, vendors play a huge part in open source. NetApp for many years and now Primary Data pay for nearly (all?) the entire Linux NFS client developer community. And we (NetApp) contribute significant amounts of money to the Linux Foundation, amongst other open projects.
Off to toast myself under the Aegean sun. It's free, but not the beer that goes with it.
Personally, I always use a good old fashioned torch when burying a body.
I use a spade. Much faster.
The IDC figures don't include tape.
That's useful info, but you'll have to permit me to correct you on a few mistakes you've made here.
Try answering the phone when someone calls you, might do wonders for your business.
I can't see where I said I didn't answer calls (assuming I have a signal on this wonderful phone).
The little numbers that appear on app icons are also there for a reason.
If you have multiple folders (say in Outlook) then the little number only reflects your unread emails in the inbox. Everything else doesn't count. You don't do that? That's probably because you don't get 200+ emails a day.
Briefly hold your finger on the TouchID sensor.
That's useful info, and I've just tried it. But on my BB I can either press the physical mute button (it's neatly placed on the side of the phone, iPhone doesn't have one), press the mute on the mic on the headset I use (iPhone doesn't support this) or press the mute screen button (like the iPhone, but I don't need to unlock it first).
Thanks. Glad to be of service.
1. An integrated inbox. All your notification stuff -- calls, tweets, emails, notifications -- in one place. Every time I get a ping on the iPhone I have to go hunting for what caused it. The notification center thing is useless for more than one notification, and I shouldn't have to do this clicking on icons searching stuff out. The BB does this.
2. When I'm on a call, don't make me have to unlock the phone to go on and off mute, which takes at least 6 screentaps, is distracting and takes too long. The BB has an external mute/unmute button.
3. The signal strength is way down on my blackberry. I work from home, and there are now places in my home office where there's no signal.
There's more, particularly to do with the standard email app, but I don't want to drone on and on. I've asked my IT department to take it back as it's affecting my productivity. I'm spending too much time trying to work around these issues.
Yes, I want to use Blackberry. I've been moved to an iPhone, and as a biz tool, it's a complete dud.
I reckon IT departments the world over are run by six year olds that don't have to use what they choose for work either, and they pick the iPhone because it's cool and great at Facetime with the g/b friend or for Facebooking food they just ordered. Mail you say? Didn't try it, and anyhow, who uses that? Yo!
My personal Galaxy has the same issues of usability, but I'm not trying to do anything serious with it, and I wouldn't pick it as a business tool either.
Yes, Blackberry has its issues, and Gawd knows it has singularly failed to stop the bleeding to rival devices, but it's far far better business tool than the polished turd I've been forced to use.
Gimme my Blackberry back.
According to the nice letter the Royal Mail put into the bag that contained a present from my sister, aftershave is considered "restricted goods". The parcel went from Westmisnter to Stanstead, got flown to a sorting office in Belfast(!), was opened (including the card in an envelope), was declared safe, got flown back(!), sent to Edinburgh and arrived with me 2 weeks later with one of these attached; http://www.postoffice.co.uk/sites/default/files/Example-ID8000-Label.pdf
What for? Can anyone enlighten me?
Nice smells though.
'The Storage world has learned a major lesson since netapp and that is not to ignore the competition..." Prior to NetApp, "the storage world" was just big boxes of disks banged together with (if you were lucky) a RAID controller. And EMC didn't ignore us; they found it really hard to compete with us.
NetApp made the technologies by which others measure themselves, even EMC. That's a set of technologies that we've used over the years to deliver advantage for our customers. It includes file-based storage systems, storage virtualisation, snapshots, capacity & performance effective RAID, dedupe, thin provisioning, compression and more. And if you reran the tape again, I bet the outcome would be about the same.
I agree with you that Pure are yet another startup that aren't redefining this industry; they're mimicking NetApp technologies. The deck is stacked against them because it's a "me too" business (with added flash). While imitation may be the sincerest form of flattery, it rarely guarantees success.
Disclosure: NetApp employee.
These sound remarkably like what we already have in a different form factor; CPU/network/storage that looks like a disk drive (or SSD) brick rather than a server brick or a switch brick. Or am I missing something?
I want an impressive set of arguments before I'm convinced this is a flyer; a solid business use case, a good viable technology solution and a workable roadmap and plan. Cheaper, faster and more reliable would help too. I'm not getting any of that from the article.
As I'm in Atlanta for the launch of these, hopefully I'll find out directly what the value/seekret soss/hype is about.
Commodity to you, but useful to me, since I'd never heard of Veblen goods until today.
<quote from expert>
...with more modern architectures that often target NetApp deployments (ONTAP is more than 15 years old).
Oh dear, so it is. Almost as old as Linux, which appeared in 1991. All those "modern architecture" systems based on a 1991 Linux. Or BSD based systems; BSD is (gasp!) even older!
Yes, I understand that analysts have to say something, especially when they're not sure about the business or the technology or the economic outlook. But it would really help if their opinion was based on a more meaningful analysis. ONTAP 8 is a modern clustered scale out system quite unlike anything else on the market. Mr Blair, please try again.
(Standard NetApp employee disclosure)
(It's "cited", not "sited"; using the wrong word doesn't help your argument.)
We do have a flash strategy and a lot more varieties of flash (FlashCache -- big caches made of flash, FlashPools -- mixed pools of SSD and disk, EF540 -- all flash array) than Hitachi has; you just haven't been paying attention.
What we don't have is a recent benchmark with flash SSD, which is a good point.
One correction ; Netapp haven't "given up" on tuning with flash. NetApp was one of the first to employ Flash based caches in 2009 with FlashCache; that's 4 years ago. We still ship and use both caches and SSDs for performance in the latest Ontap 8, along with the all-flash EF540 and the soon to be delivered FlashRay.
It's interesting that you think that the startups you mention all want to be like NetApp when they grow up -- or perhaps that's what they told you? -- for it's certainly what they want you to believe.
The truth is they don't want to be like NetApp. No way. Not in a month of Sundays. That would mean putting in some hard work for the long term, and none of them are in it for the long term.
NetApp has got a whole set of experiences and innovation in this space built over decades of investment and learning that these new guys on the block just don't have. And won't have and don't want either, since their business model (as you note) is a desire to get bought and make the principals some money.
Of course the storage industry is looking at this. Chris, I told you so at SNW in October last year, but you wanted to grill me about the "bump in the wire" cache vendors... Ah well.
For posters here, don't get carried away by memory storage schemes and thinking that it's a solved problem, or even an easy problem to solve. It's not.
For anyone that's interested, here's the background and what's being done; http://snia.org/forums/sssi/nvmp, and in particular, this presentation; https://intel.activeevents.com/sf13/connect/fileDownload/session/461EB56CC073EA43BDFCEC22AE2D3C88/SF13_CLDS009_100.pdf
Next year's tech? Probably not. But it will come; byte addressed, persistent and cheap memory is just too attractive given what we have now. More information can be had by contacting the NVM group at SNIA.
I'm not surprised that the HDS box did as well as it did, given that both NetApp benchmarks were submitted in September 2011 (2 years ago) and didn't have the benefit of SSDs. The 6240 is no longer sold.
What do you mean by "file I/Os per second, instead of disk block"?
NFS operations are not "file I/Os" since a large chunk of them -- 72% as I pointed out earlier -- are not operations on the file at all. This sort of sloppy thinking leads to no more than the death of another kitten.
PS; I'm sure all the marketing suits at HDS are really delighted that the rebranding from BlueArc to HUS that they slaved over all these years ago has completely escaped you. Mind you, that's excusable. No kittens died for that mistake.
A little furry kitten dies in the benchmark labs every time someone quotes IOPS on a SPEC SFS.
SPEC SFS does not measure IOPS. SANs do IOPS, and the benchmark for them is SPC-1. NFS systems do NFS operations, and SPEC SFS measures them.
Only 28% of the operations in SPEC SFS are a READ or a WRITE operation (that is, accessing the data). The other 72% of the operations are on meta data (mainly directory information).
There's a big difference. Please, think of the little kitties.
Can't make it, which is a shame. Knieriemen is owe me at least one drink, I'm sure.
I liked the capacity gag above. How about
"I only came here to get HAMRed"
"There's a flasher or two in here tonight"
"No thanks, I'm the nominated drive."
"Run boys, it's a raid..."
2000 servers with 6 disks each? They're crazy. 2000 servers with 6 disks between them? You're crazy.
Other than that, your story is a good one. For the pub.
<quote>If you just want to dump some data somewhere though, any old cheap storage will do.</quote>
Uh, no. Reality sucks, Mr pPPPP. Bit if you just want to lose your data, any old cheap storage will do.
and the Parallel NFS (pNFS) client is still in tech preview; the latter includes support for Microsoft's Direct I/O, which allows for data to be read from disk to application buffers without stopping at file buffers
I've spoken to the Linux developer of the NFS client that's part of the standard distributions, including RHEL6. The support is for Linux O_DIRECT in the pNFS code line, and has nothing to do with Microsoft's Direct I/O. That is a specific Windows OS feature for Windows device drivers.
Alex McDonald, CTO Office, NetApp.
Buy a decent NFS box...
There's a shedload of work taking place in Windows, and there's a fully fledged open source client available right now; download it from here http://www.citi.umich.edu/projects/nfsv4/windows/readme.html
Unlike FXP it's secure. The client has to inform both the source and target of its security credentials, and that the source will be contacted by a specific target.
Terje Mathisen (a well known programming optimization guru) once said: "All programming is an exercise in caching."
The same applies here. All data management is an exercise in caching.
This isn't about tiering; it's about caching. Unless, of course, you've got a product that does only tiering, in which case it makes a lot of sense to confuse the two.
To clarify my comment;
He pointed out that SolidFire deduplication is global, working across all volumes, whereas "NetApp ASIS only dedupes on a per-volume basis in an array and not across volumes in an array." Alex McDonald from NetApp's office of the CTO confirmed this but said NetApp can have many, many LUNS in a volume.
Since you can have lots of things (LUNs or filesystems) inside a NetApp volume, and a volume is a virtual construct that can be very large indeed -- several 10s of TB -- then global dedupe really doesn't buy you much saved space if you're already deduping across that much data.
Good on SolidFire though to recognize that we're the storage system of choice. ;-)
Alex McDonald of NetApp here.
Chris, my apologies; I promised you some reasoned arguments and background information as to why EMC/Isilon appear to be misunderstanding the specSFS benchmarks. Since you've published, I'm replying here.
Twomey of EMC makes one valid point; "Scale-out means different things to Isilon, [IBM] SONAS, [HP] IBRIX and NetApp." But this isn't about definitions or about what we each mean by scale-out or scale-up or scale-anything; it's about scale -- full stop -- and a benchmark which is tightly defined (and where we spanked EMC). The rest of his arguments are, as usual, diversionary nonsense. What's eating Twomey is the fact that NetApp's submission was smaller, cheaper and faster.
But I am surprised at Peglar, the America's CTO (Chief Technology Officer) of Isilon, because he betrays a serious misunderstanding of the benchmark, and I'm surprised that he isn't better informed. Here's what he should know.
The specSFS benchmark creates 120MB of dataset for every requested NFS operation. You can't control how much space the benchmark is going to use -- in fact, the usual complaint is how big the SFS dataset size is. We (NetApp) chose a volume size of 12TB for each volume giving 288TB. The main number to look at for the benchmark is the file set size created which was 176176GB (176TB) for the 24 node test. We could have created much bigger volumes and could have exported the capacity of the entire system at 777TB. Which would have not made a difference to the results; since the fileset size created would *still* have been 176TB.
Isilon exported all the usable capacity. 864TB. The benchmark dataset size for them was 128889GB (129TB).
So, on inspection, it took Isilon 3,360 10K rpm disk drives (plus 42TB of flash SSDs) to service 129TB of data. NetApp took 1,728 15k rpm disk drives (plus 12TB of flash cache) to service 176TB of data.
Now who's short stroking?
There are two arguments un-informed arguments we hear about benchmarks all the time, and I thought Peglar would have understood them and why they aren't relevant.
Argument 1: If one doesn't touch every byte of the exported capacity then the system is being gamed, so as to short stroke the disks and gain an unfair advantage.
Response 2: There will never be any real world workload that touches *every single byte* of all available capacity. That is not the way systems have, or will ever be used. Benchmarks model a realistic workload and measure systems under that load, not bizarre edge cases.
Argument 2: Creating LUNs that are smaller than the maximum capacity is creating short stroking and an unfair advantage.
Response 2: Modern filesystems no longer couple the data layout with the exported capacity. Thus, there is no performance advantage that is related to LUN size or the exported capacity. As long as the same amount of data is accessed across systems then the equal performance comparison is valid; or, as in the NetApp submission, where a *lot* more data is being accessed, the benchmark demonstrates it's a much better performer. If you are seeing a difference in performance that is coupled to exported capacity, you might want to consider a NetApp system that does not have such an antiquated data layout mechanism.
Summary: The total exported capacity is the combined capacity of the volumes created. It does not have any bearing on the performance obtained.
The argument Peglar makes would seem to indicate that Isilon may have one of those old, steam-driven data layouts. But, of course, an Isilon system doesn't, so why he's making the points he does is beyond me. There are only a couple of reasons that EMC/Isilon could present an invalid premise for an argument; (1) they don't understand the subject material, and lack experience in debating these issues, or (2) they fully understand the subject material and believe that the person they are trying to convince does not.
I'll let you guess as to which I think is the case.
Please contact me directly at alex mc at netapp dotwotsit com (squish & replace) so we can get the issues addressed.
Biting the hand that feeds IT © 1998–2017