* Posts by dikrek

106 posts • joined 16 Nov 2010

Page:

Pure Storage is betting its FlashArray farm on NVMe

dikrek
Boffin

So many Captain Obvious points...

Hi all, Dimitris from Nimble here.

My viewpoint was posted here:

http://www.theregister.co.uk/2016/11/25/nvme_storage_sea_change/

I did state that even without NVMe drives in the array, speeds would overall be improved if the client side adopts NVMe over Fabrics.

In general, this whole business with NVMe is all about vendors trying to get mindshare about a technology that everyone will have anyway. Trying to build differentiation where there is none.

It's pretty simple:

1. The NVMe protocol itself helps increase headroom a bit, which means arrays will get a bit more efficient

2. It shaves 15-20 microseconds of latency at the protocol stack (which isn't significant for NVMe SSDs - 100 microseconds vs 115 microseconds won't set the world on fire)

3. AFA controllers are already maxed out with current-gen SSDs.

Nimble's architecture is NVMe-ready, and other vendors' too. It's not rocket science, it's more about waiting for the right bits to drop in price and customers to be willing to adopt NVMe all the way to the client.

The more exciting tech is next-gen byte-addressable storage like 3D Xpoint and the like, sitting in a DIMM. Not everyone is ready for THAT tech... ;)

FYI, NVMe is positively glacial in every way compared to non-volatile memory in a DIMM. Nimble has been using byte-addressable NVDIMM-N instead of NVRAM for a while now...

Thx

D

0
0

Storage newbie: You need COTS to really rock that NVMe baby

dikrek
Boffin

NVMe over Fabrics is of course needed to fully realize NVMe benefits

Hi all, Dimitris from Nimble here (http://recoverymonkey.org). Clarification:

1. Putting NVMe drives and/or something like a 3D Xpoint DIMM in existing (modern) arrays can improve speeds, up to a point.

2. Implementing NVMe over Fabrics is necessary to unleash the total end-to-end performance all the way to the client.

3. Beware that speed doesn't get in the way of enterprise features, especially things that mitigate risk like multi-level checksums, snaps, clones, replication, encryption etc. Many devices out in the market are focusing on speed so much that they are ignoring even basic creature comforts.

The challenge really is that most customers move cautiously and aren't always ready to adopt things that have barely been standardized, especially in low risk tolerance environments.

Thx

D

1
0

Software-defined traditional arrays could be left stranded by HCI

dikrek

It's not about HCI per se

It's interesting to examine why some people like the HCI model.

Part of the allure is having an entire stack supported by as few vendors as possible. Very few HCI vendors fit that description.

The other part is significantly lower OPEX. Again, not all HCI vendors shine there.

And for CAPEX - the better HCI vendors that fit both aforementioned criteria tend to be on the expensive side. So it's not about CAPEX savings.

It's also interesting that the complexity of certain old-school vendors has quite a bit to do with certain solutions becoming more popular (not just HCI). Compared to certain modern storage systems, you may find that the difference in ease of consumption is minimal.

Be careful that just because you like the HCI dream you don't give up things that have kept you safe for decades.

Case in point: several HCI and SDS vendors don't do checksums! (Even VSAN and ScaleIO only recently started doing optional checksums).

That's like saying I like electric cars, and in order to achieve the dream of having such a car I need to give up on ABS brakes.

Or things like proper firmware updates for all the components of the stack. Again, many solutions completely ignore that. And that inability can significantly increase business risk.

More here:

http://recoverymonkey.org/2016/08/03/the-importance-of-ssd-firmware-updates/

Disclaimer: I work at Nimble but if I didn't there's only one HCI vendor I'd go to work for. One. Out of how many?

Thx

D

1
0

IO, IO, it's profiling we do: Nimble architect talks flash storage tests

dikrek
Boffin

Re: But Nimble's data is mostly SMB customers

Not at all true. Nimble's install base ranges all the way from huge to SMB. We have detailed data for all kinds of enterprise apps - Oracle, SQL, SAP, plus more exotic ones like HANA, SAS, Hadoop.

Plus the more pedestrian VDI, Sharepoint, Exchange, file services...

The interesting thing is that Nimble's and Pure's stats actually agree

What differs is each company's reaction to the math and their ability to drill down in even more detail so that one knows how to automatically treat different kinds of I/O even within the same app.

For instance, DB log files always expect extremely fast response times. That exact same DB doing a huge table scan expects high throughput but the table scan is not nearly as latency-sensitive as being able to write immediately to the log.

Being able to differentiate between the two kinds of I/O is important. Being able to act upon this differentiation and actually prioritize each kind of I/O properly is even more important.

It's far more than just letting random I/O through like Nate's drawing of what 3Par does. It's about knowing, per application, how to properly deal with different I/O classes intelligently, and accurately alert with very little noise if there is a problem.

Thx

D

0
0
dikrek
Boffin

Allow me to clarify

Hi all, Dimitris here (recoverymonkey.org).

In case it wasn't clear enough from the article:

1. Pure is variable block, their very own results show the bimodal nature of I/O (clearly split between small block and large block I/O), yet they keep talking about 32K I/O sizes which is the (correct) mathematical average of overall throughput. Nimble's research shows that using the throughput average isn't the optimal way to deal with I/O since the latency sensitivity hugely varies between I/O sizes and apps.

2. Nimble arrays are of course variable block and in addition application-aware. Even the latency alerting takes into account whether I/O is latency-sensitive or not, which increases alerting accuracy by 10x.

3. Nimble's research shows that you need to treat different size and type I/O accordingly. For instance, large block sequential doesn't need to be given preferential latency treatment. Small block random I/O on the other hand is more latency-sensitive depending on the app. Nimble will automatically take into account app type, I/O size, latency sensitivity and randomness in order to intelligently prioritize latency-critical I/O in the event of contention to protect the user experience where it counts. This is an extremely valuable ability to have and helps automatically prevent latency spikes in the portion of I/O that actually cares about latency. See here: http://bit.ly/2cFg3AK

4. The ability to do research that shows detailed I/O characteristics by app instead of the whole array is what allows Nimble to be so prescriptive. The analysis we do is automated and cross-correlated across all our customers instead of having to ask someone in order to find out what apps an array has on it.

Final but important point: This is a disagreement about math, not about the actual I/O prowess of the two vendors. Both have excellent performing systems. But my problem with the math is when it starts getting misleading for customers.

The other disagreement is talking about hero numbers at what looks like 100% reads. Sure they look good but if the performance drops to half when doing heavy writes then it's not that exciting, is it?

Thx

D

4
0

Mangstor tells IT managers: Hey SANshine, c'mon in, the fabric is fine

dikrek
Boffin

There's performance, then there's RAS and functionality...

Front-ending other arrays in and of itself and adding a cache layer isn't horrendously difficult.

There are a few ways to do it.

Often this involves making the hosts use the virtualizing array as their target.

If this is done, some things to consider are:

1. How is multipathing handled by the virtualizing system? As well as the underlying array?

2. How are checksums handled? How is the virtualizing array able to ensure the underlying array has proper integrity?

3. How does the virtualization affect underlying functionality like replication, QoS, clones, snaps? Existing scripts? Existing orchestration software?

4. If the front-end array will provide only cache, how will flushing be handled? And how does that affect overall reliability?

5. What's the process for front-ending? Migration? Or just path switchover? (both techniques are used by various vendors, migration is obviously more painful).

6. If I want to remove the front-ending device, is that as easy as path switchover or more convoluted?

Thx

D (disclosure: Nimble Storage employee, http://recoverymonkey.org, and overall nerd).

0
0

SPC says up yours to DataCore

dikrek
Boffin

Re: Why use and array of any type anyway?

Ok Cheesy I'll bite.

In almost every storage-related post (I've been keeping track) you mention these behemoth server-based systems and how fast they are. We get it. You don't like centralized storage and believe server-based is faster. To get this out of the way: properly designed server-based storage can be extremely fast - but not necessarily less expensive.

However, enterprises care about more than just speed. Far more. Otherwise we'd all just use RAM disks.

Out of curiosity, does Windows-based storage do things like live, automated SSD firmware updates regardless of server vendor?

What about things like figuring out how to automatically stage firmware updates for different components including the server BIOS, HBAs, the OS itself, and other components?

Or analytics figuring out what patches are safe to apply vs what bugs other customers with similar setups to yours have hit in the field? (maybe a certain patch will break your specific config).

How about analytics figuring out what exact part of your infrastructure is broken? Based on advanced techniques like pattern matching of certain errors, machine learning and predictive analytics?

Does server-based storage provide comprehensive protection from things like misplaced writes and torn pages? (Hint: checksums alone don't do the job).

In addition, are you running NVMe over fabrics? Because simply a fast Ethernet switch isn't enough to maintain low latencies. Or are you doing SMB3 RDMA? And what if my application won't work on SMB3? Maybe it's an AIX server needing crazy fast speeds?

Are the servers perchance mirrored? (Since you need to be able to lose an entire server without issue). If mirrored, doesn't it follow that 50GB/s writes to the cluster will result in an extra 50GB/s of intra-cluster traffic? Isn't that wasteful? (server-based should really be triple mirrored BTW).

And if erasure coding is employed, how is latency kept in check? Not the best mechanism to protect NVMe drives at scale.

Honestly curious to know the answers, maybe server-based really is that cool these days.

Thx

D (disclaimer: Nimble Storage employee)

0
0

DataCore drops SPC-1 bombshell

dikrek
Boffin

Re: ByteMe, you're forgetting something Is SPC-1 still relevent with massive cache systems?

<sigh>

Huawei had a 68TB data set and 4TB cache, or a 17:1 dataset:cache ratio.

Datacore about 11TB and 2.5TB cache, or a 4.4:1 dataset:cache ratio.

Not even close.

You probably work for datacore and are understandably excited. It's a good number and does show that the engine can provide a lot of IOPS when unencumbered by back-end disk.

BTW: Here's why not all data in SPC-1 is hot.

In the spec here: http://www.storageperformance.org/specs/SPC-1_SPC-1E_v1.14.pdf

Look at page 32. Notice there's an "intensity multiplier" AND a "transfer address".

ASUs 1 and 2 do reads and writes, and are 90% of the data.

If you look at that page you'll see that there are 3 hotspot zones with clearly identified multipliers and limited transfer address ranges:

- ASU1 stream 2

- ASU1 stream 4

- ASU2 stream 2

For example, ASU1 stream 2 has a multiplier of .281 (28.1%) and an address range of .15-.2 (merely 5% of the total data in the ASU).

If you do the math you'll see that of the total capacity, 6.75% across those 3 streams is significantly hotter than the rest of the data. So if that 6.75% comfortably fits in cache in any of the systems in the SPC-1 results, there will be a performance boost.

Not all the systems measured in SPC fit that percentage in cache, but many do.

Datacore went a step further and made cache so large that the entire test dataset is only 4.4x the size of the cache, which is unheard of in any enterprise deployment. How many of you will have just 11TB of data and 2.5TB RAM? Really? :)

And, last but not least, something anecdotal - all I will say is we had Datacore fronting some of our systems and the overall performance to the client was overall less than native.

The servers used weren't the behemoths used in the SPC-1 test of course. But this may mean something to some of you.

Thx

D

1
0

DataCore scores fastest ever SPC-1 response times. Yep, a benchmark

dikrek
Boffin

Re: NetApp comments -- nothin but FUD, dispelled here...

<sigh>

Go to the spec here: http://www.storageperformance.org/specs/SPC-1_SPC-1E_v1.14.pdf

Page 32.

Do a bit of math regarding relative capacities vs the intensity multiplier.

See, it's not uniform nor 100% hot at all. Different parts of the dataset are accessed different ways and certain parts will get more I/O than others.

i've been dealing with SPC-1 for years.

Not with NetApp any more but facts are facts.

Thx

D

0
0
dikrek
Boffin

Second article about this so soon?

Hi all, Dimitris from NetApp here.

Interesting that there's another article about this.

Plenty of comments in the other reg article:

http://forums.theregister.co.uk/forum/1/2016/01/07/datacores_benchmark_price_performance/

Indeed, latency is crucial, but Datacore's benchmark isn't nicely comparable with the rest of the results for 2 big reasons:

1. There's no controller HA, just drive mirroring. Controller HA is where a LOT of the performance is lost in normal arrays.

2. The amount of RAM is huge vs the "hot" data in the benchmark. SPC-1 has about 7% "hot" data. If the RAM is large enough to comfortably encompass a lot of the benchmark hot data, then latencies can indeed look stellar, but hitting the actual media more can be more realistic.

Thx

D

6
1

Can NetApp's 4KB block writes really hold more data?

dikrek

Re: What's so special?

Indeed, there are architectures that don't even have the concept of a block as such, let alone variable. They don't have this issue of partially filled blocks.

Still doesn't address the fact dedupe is within a volume domain and not global.

0
0

Nimbus Data CEO shoots from the hip at NetApp's SolidFire buy

dikrek

Re: Means little

Hi Jeremiah,

As someone who, in a previous life, spent time in SPs (Acxiom) and large enterprises (Halliburton, United Airlines) I can tell you that balance is absolutely required in large shops.

Maybe that's the disconnect. I'm thinking of the potential in multi-PB or even EB deployments.

Most large SPs want capacity and performance tier differentiation, not just performance QoS and the ability to provision via an API.

For example, they will charge a lot less for NL-SAS hybrid capacity than SSD-based capacity, partly because it costs them less in the first place, partly because customers expect it to cost less.

They also have needs for different kinds of density for different kinds of data.

The $/GB to store something on 10TB low cost spinning disk are a tiny fraction of the costs of storing it in mirrored SSD.

This means that Service Providers naturally tend to have a variety of tiers instead of just one. Unless they're very very small, in which case a single box is enough. But then again those very small SPs also tend to be very cost-conscious.

SolidFire has had some success in that industry but the large Service Providers typically have much more than SolidFire. If anything, the percentage of capacity in the typical SP will be less SSD and much more of other stuff, as much as we'd all like that to not be the case :)

And finally, the overall number of SolidFire customers is very small.

Percentage-wise many of those customers may be Service Providers, but that doesn't mean significant penetration in that space.

D out.

1
0
dikrek

Re: Means little

I'm the furthest thing from a hater. I just find it interesting that every time someone explains about inefficiencies in the architecture, people complain that it's not important, and "look at all the business benefits".

It IS true that the people SolidFire is aimed at ALSO care about things like power/space/cooling/cost.

Ignoring that sort of requirement for that type of business is just silly.

Attacking people for calling this out just because they work at competitors is probably not very productive.

Go through every post of mine on The Register. You won't find a single mention of SolidFire when I was at NetApp. Actually there is one, where I thought the competitive material they had vs ONTAP was silly.

Check my blog too. No mention of it there either. I never pull any articles. It's all there. No mention of the purchase, no glowing article about what kind of problems it solves.

Now, start thinking about possible correlation. And it's not that I'm an ONTAP bigot. I was involved in the EF and FlashRay projects too, and it's funny how wrong people get the FlashRay story. It was a kickass product. Never given a chance. Out of respect I won't say more.

BTW - I can always post anonymously. Like most vendor peeps seem to do.

Or just stop commenting altogether, like my friends advise me to. "Who reads the comments anyway" :)

Thx

D

3
0
dikrek
Boffin

Re: Means little

Instead of focusing on the size of nimbus, focus on what the man is actually saying.

Working for a competitor doesn't automatically render an argument meaningless. Logic is logic.

It is true that service providers value multiple things:

- automation

- ease of use

- predictability

But they also value:

- price/performance

- rack density

- low power and cooling requirements.

Hand-waving away the second set of requirements doesn't make sense.

A solution that is, best case, 40% usable:raw, and, best case, 4 TIMES less dense than some competitive products (some from the same parent company), grossly violates the second set of SP requirements.

It is what it is - customers need to carefully consider what's most important to them and act accordingly.

Thx

D (Nimble employee, ex NetApp, scribblings at recoverymonkey.org)

2
0

SolidFire's thin flash density eats more racks than HPE. What gives?

dikrek

Re: A seriously first world datacenter/business problem

Correct, it's not so much about the size of the SSDs as what the overall efficiency of the solution is. SolidFire is still not very dense due to the architectural decisions I mentioned in my first comment. A lot of expensive SSD capacity gets wasted in any server-based grid storage approach. ScaleIO, VSAN are similar. Erasure coding is the enemy of low latency so server-based storage typically goes for 2-3 total copies of the data (ideally 3 copies to avoid exposure in the case of node failure but SolidFire doesn't do that kind of protection at all).

There's no perfect platform, it's all about choosing what's best for your business and it's always down to some sort of compromise:

Price, performance, density, capabilities needed, RAS.

Thx

D

3
0
dikrek
Boffin

An inherently inefficient architecture

SolidFire, due to the requirement for mirroring, leaving a full node's worth of capacity free (to withstand a node failure) and not wanting to be over 90% full on the remaining space left, means usable:raw for a 4-way config is closer to 30%.

Assuming the industry standard 5:1 on the 30% means a very low final multiplier (Effective Capacity Ratio).

See here:

http://recoverymonkey.org/2016/05/19/the-importance-of-the-effective-capacity-ratio-in-modern-storage-systems/

This is all fairly basic math. Solidfire even has a sliding calculator on their website and you can verify the ratios easily using that.

See here:

http://www.solidfire.com/platform/hardware

There's the other wrinkle of metadata handling that makes it harder for SolidFire to use large SSDs since there's a RAM ratio that needs to be enforced, similar to XtremIO.

The value of SolidFire isn't dense flash storage. If you want that, look elsewhere.

Thx

D (Nimble employee & ex-NetApp)

10
0

HPE bolts hi-capacity SSD support onto StoreServ

dikrek
Boffin

Re: Purpose?

Ok Cheesy I'll bite.

In almost every storage-related post (I've been keeping track) you mention these behemoth server-based systems and how fast they are. We get it. You don't like centralized storage and believe server-based is faster. To get this out of the way: properly designed server-based storage can be extremely fast - but not necessarily less expensive.

However, enterprises care about more than speed. Far more.

Out of curiosity, does Windows-based storage do things like live, automated SSD firmware updates?

What about things like figuring out how to automatically stage firmware updates for different components including the server BIOS, HBAs, the OS itself, and other components?

Or analytics figuring out what patches are safe to apply vs what bugs other customers with similar setups to yours have hit in the field?

How about analytics figuring out what exact part of your infrastructure is broken? Based on advanced techniques like pattern matching of certain errors?

Does server-based storage provide comprehensive protection from things like misplaced writes and torn pages? (Hint: checksums alone don't do the job).

In addition, are you running NVMe over fabrics? Because simply a fast Ethernet switch isn't enough to maintain low latencies. Or are you doing SMB3 RDMA? And what if my application won't work on SMB3? Maybe it's an AIX server needing crazy fast speeds?

Are the servers perchance mirrored? (Since you need to be able to lose an entire server without issue). If mirrored, doesn't it follow that 50GB/s writes to the cluster will result in an extra 50GB/s of intra-cluster traffic? Isn't that wasteful?

And if erasure coding is employed, how is latency kept in check? Not the best mechanism to protect at scale NVMe drives.

Honestly curious to know the answers, maybe server-based really is that cool these days.

Thx

D (disclaimer: Nimble Storage employee)

4
1

NetApp shrinky-dinks ONTAP 9: Will support 4:1 data reduction

dikrek
Coat

Re: Beware of what is being included in the data reduction ratio

Oh, I resigned all right. Far too many bad decisions one after another. Far too much management fluff. Out of respect for my former colleagues I will not elaborate further. The market will show whether cancelling an extremely promising product like FlashRay and buying SolidFire instead was the correct decision, for example.

I had my pick of vendors to join and I joined Nimble. And it wasn't about the highest bidder or anything crass like that. I like having purpose and making a difference. And the products are amazing.

InfoSight is far, far more than Nimble markets it as, and the Nimble AFA systems have more innovation in them than several competitors combined, again under marketed.

Exciting times ahead!

Thx

D

1
3
dikrek

Beware of what is being included in the data reduction ratio

Hi all, Dimitris from Nimble (ex-NetApp).

Several vendors show a single efficiency number that is nebulous and includes all kinds of stuff, including thin provisioning and snapshots. Diving into the true savings often reveals a much less compelling story.

It is more appropriate to be able to show what efficiency is coming from where. For instance, this much saving from dedupe, this much from compression, this much from clones, stuff like that.

A guarantee needs to be clear regarding what precisely is counted in the data reduction.

I've written a vendor-agnostic article on this subject:

http://recoverymonkey.org/2016/05/19/the-importance-of-the-effective-capacity-ratio-in-modern-storage-systems/

Thx

D

1
3

Sick of storage vendors? Me too. Let's build the darn stuff ourselves

dikrek
Boffin

Re: Anyone can build something small

Folks, it's all doable, just remember that something as seemingly simple as automatically managed online drive firmware updates can be of paramount importance.

Especially in this age of SSDs, drive firmware is updated rapidly - lots of corruption issues are resolved.

Not being aware of the issues is one problem. Not being able to update the firmware live is a different problem.

Find the release notes for firmware updates for some popular SSDs out there and you'll quickly see what I mean.

Thx

D

1
1

The kid is not VSAN: EMC buffs up ScaleIO for high-end types

dikrek

Re: Ah the good 'ol quick jab

Actually my point is checksums are kinda assumed in storage, taken for granted.

Most people won't even think to ask since it's unthinkable to not do it.

Yet EMC wasn't doing this until recently (for those 2 products only, on their other storage platforms they do plenty of checksumming).

It's not trashing them, it's more about reminding people not to take this stuff for granted.

Ask!

0
1
dikrek
Boffin

Only now getting checksums? After all this time?

Hi all, Dimitris from Nimble here.

Is it just me or is the addition of checksums in both VSAN 6.2 and the latest ScaleIO a bit glossed over?

Checksums are immensely important to storage - so important that nobody in their right mind should buy storage that doesn't do quite comprehensive detection and correction of insidious errors.

My sense of wonder stems from the fact that EMC and VMware have been happily selling ScaleIO and VSAN to customers without mentioning this gigantic omission up until now. And now it's quietly mentioned among many other things.

Interesting...

Thx

D

0
0

Back-to-the-future Nexsan resurrects its SATABeast

dikrek
Boffin

Nobody thinks of torque and vibration?

Hi All, Dimitris from NetApp here.

I'm shocked anyone thinks a top-loading shelf full of heavy SATA drives is a good idea. You pull the ENTIRE THING out in order to replace a SINGLE drive??

How safe is that?

Both for drive vibration (you're shaking 60 rotating drives at once) and torque (such a system is HEAVY!)

There is a better way. Front-loading trays (imagine that). On our E-Series platform, the 60-drive shelf is divided into 5 slices, each with 12 drives.

Each slice is shock mounted, much lighter than all 60 drives, and slides out butter-smooth in order to replace a drive.

Thx

D

0
3

After all the sound and fury, when will VVOL start to rock?

dikrek
Boffin

Very few arrays can do VVOL at scale

Hi all, Dimitris from NetApp here (recoverymonkey.org).

This is why we allow 96000 LUNs in an 8-node ONTAP cluster, as of ONTAP 8.3.0 and up.

Thx

D

1
0

HPE beefs up entry MSA with a bit of flash

dikrek

Hi all, Dimitris from NetApp here.

Not understanding the comparison to the FAS2500.

The FAS runs ONTAP and has an utterly ridiculous amount of features the MSA lacks.

A better comparison would be to the NetApp E2700, a much more similar in scope device to the MSA.

Of course, whoever provided the math (I doubt it was Chris that did it) might not have a very good case then ;)

Thx

D

1
1

NetApp hits back at Wikibon in cluster fluster bunfight

dikrek
Boffin

Re: Some extra detail

Hi Trevor, my response was too long to just paste, it merited a whole new blog entry:

http://recoverymonkey.org/2016/02/05/7-mode-to-clustered-ontap-transition/

Regarding your performance questions: Not sure if you are aware but posting bakeoff numbers between competitors is actually illegal and in violation of EULAs.

We have to rely on audited benchmarks like SPC-1, hopefully more of the startups will participate in the future.

I suggest you read up on that benchmark, it's extremely intensive, and we show really good latency stability.

Though gaming SPC-1 is harder than with other benchmarks, it too can be gamed. In my blog I explain how to interpret the results. Typically, if the used data:RAM ratio is too low, something is up.

Which explains a certain insane number from a vendor running the benchmark on a single Windows server... :)

Take care folks

D

0
0
dikrek
Boffin

Re: queue the Ontap Fan boys

<sigh>

Flash makes everything CPU and pipe bound. If anything, CPU speed is becoming commoditized very rapidly... :)

And yes, cDOT 8.3.0 and up IS seriously fast. We have audited benchmarks on this, not just anonymous opinion. I understand that competitors don't like/can't come to terms with this. Such is the way the cookie crumbles. Look up the term "confirmation bias".

Regarding SolidFire: It's not about speed vs cDOT. As a platform, SolidFire is very differentiated not just vs cDOT but also vs the rest of the AFA competition.

The SolidFire value prop is very different than ONTAP and performance isn't one of the differentiators.

What are some of the SolidFire differentiators:

- very nicely implemented QoS system

- very easy to scale granularly

- each node can be a different speed/size, no need to keep the system homogeneous

- ridiculously easy to use at scale

- little performance impact regardless of what fails (even an entire node with 10x SSD failing at once)

- the ability to run the SolidFire code on certified customer-provided servers

- great OpenStack integration

All in all, a very nice addition to the portfolio. For most customers it will be very clear which of the three NetApp AFA platforms they want to settle on.

Thx

D

8
4
dikrek
Boffin

Some extra detail

Hi All, Dimitris from NetApp here.

It is important to note that Mr. Floyer’s entire analysis is based on certain very flawed assumptions. Here is some more detail on just a couple of the assumptions:

1. Time/cost of migrations:

a. The migration effort is far smaller than is stated in the article. NetApp has a tool (7MTT) that dramatically helps with automation, migration speed and complexity.

b. It is important to note that moving from 7-mode to a competitor would not have the luxury of using the 7MTT tool and would, indeed, result in an expensive, laborious move (to a less functional and/or stable product).

c. With ONTAP 8.3.2, we are bringing to the market Copy Free Transition (CFT). Which does what the name suggests: It converts disk pools from 7-mode to cDOT without any data movement. This dramatically cuts the cost and time of conversions even more (we are talking about only a few hours to convert a massively large system).

d. NetApp competitors typically expect a complete forklift migration every 3-4 years, which would increase the TCO! Mr. Floyer should factor an extra refresh cycle in his calculations…

2. Low Latency Performance:

a. AFF (All-Flash FAS) with ONTAP 8.3.0 and up is massively faster than 7-mode or even cDOT with older ONTAP releases. To the order of up to 3-4x lower latency. cDOT running 8.3.0+ has been extensively optimized for flash.

b. As a result, sub-ms response times can be achieved with AFF. Yet Mr. Floyer’s article states ONTAP is not a proper vehicle for low latency applications and instead recommends competing platforms that in real life don’t perform consistently at sub-ms response times (in fact we beat those competitors in bakeoffs regularly).

c. AFF has an audited, published SPC-1 result using 8.3.0 code, showing extremely impressive, consistent, low latency performance for a tough workload that’s over 60% writes! See here for a comparative analysis: http://bit.ly/1EhAivY (and with 8.3.2, performance is significantly better than 8.3.0).

So what happens to Mr. Floyer's analysis once the cost and performance arguments are defeated?

Thx

D

9
4

Don’t get in a cluster fluster, Wikibon tells NetApp users

dikrek
Boffin

Some extra detail

Hi All, Dimitris from NetApp here.

It is important to note that Mr. Floyer’s entire analysis is based on certain very flawed assumptions. Here is some more detail on just a couple of the assumptions:

1. Time/cost of migrations:

a. The migration effort is far smaller than is stated in the article. NetApp has a tool (7MTT) that dramatically helps with automation, migration speed and complexity.

b. It is important to note that moving from 7-mode to a competitor would not have the luxury of using the 7MTT tool and would, indeed, result in an expensive, laborious move (to a less functional and/or stable product).

c. With ONTAP 8.3.2, we are bringing to the market Copy Free Transition (CFT). Which does what the name suggests: It converts disk pools from 7-mode to cDOT without any data movement. This dramatically cuts the cost and time of conversions even more (we are talking about only a few hours to convert a massively large system).

d. NetApp competitors typically expect a complete forklift migration every 3-4 years, which would increase the TCO! Mr. Floyer should factor an extra refresh cycle in his calculations…

2. Low Latency Performance:

a. AFF (All-Flash FAS) with ONTAP 8.3.0 and up is massively faster than 7-mode or even cDOT with older ONTAP releases. To the order of up to 3-4x lower latency. cDOT running 8.3.0+ has been extensively optimized for flash.

b. As a result, sub-ms response times can be achieved with AFF. Yet Mr. Floyer’s article states ONTAP is not a proper vehicle for low latency applications and instead recommends competing platforms that in real life don’t perform consistently at sub-ms response times (in fact we beat those competitors in bakeoffs regularly).

c. AFF has an audited, published SPC-1 result using 8.3.0 code, showing extremely impressive, consistent, low latency performance for a tough workload that’s over 60% writes! See here for a comparative analysis: http://bit.ly/1EhAivY (and with 8.3.2, performance is significantly better than 8.3.0).

So what happens to Mr. Floyer's analysis once the cost and performance arguments are defeated?

Thx

D

2
0

HDS brings out all-flash A series array

dikrek
Boffin

Re: Thin provisioning is not a "saving"

Hi all, Dimitris from NetApp here.

Indeed, thin provisioning is not "true" savings. Many vendors claim 2:1 savings from thin provisioning alone.

Rob, agreed: It's easy to demonstrate 100:1 savings from that feature by massively overprovisioning.

It is what it is, that's the state of capacity reporting these days and marketing claims. Most arrays are showing savings including thin provisioning, PLUS compression and dedupe where available.

Check this out for some pointers on how to calculate savings and ignore what the GUI shows:

http://recoverymonkey.org/2015/06/15/calculating-the-true-cost-of-space-efficient-flash-solutions/

A lot depends on your perspective and what you're comparing the system to.

I've seen a NetApp system for Oracle run at 10000% (yes ten thousand percent) efficiency since that customer was using an insane number of DB clones.

If your existing system can't do fast, non-performance-impacting clones, then clearly comparing it to the NetApp system would mean NetApp would show as hugely efficient.

If, on the other hand, your system can also do the fancy clones, AND thin provisioning, then in order to compare efficiencies you need to compare other things...

Thx

D

1
0

DataCore’s benchmarks for SANsymphony-V hit a record high note

dikrek
Boffin

Re: Too little too late

@Crusty - your post was hilarious. Especially this nugget:

" the NetApp has no idea what it's filing and dies a slow painful death in hash calculation hell. Heaven forbid you have two blocks which are accessed often which have a hash collision"

That's right, that's _exactly_ the reason NetApp gear is chosen for Exabyte-class deployments. All the hash collisions help tremendously with large scale installations... :)

http://recoverymonkey.org/2015/10/01/proper-testing-vs-real-world-testing/

Thx

D

0
0
dikrek
Boffin

It helps if one understands how SPC-1 works

Hi all, Dimitris from NetApp here.

The "hot" data in SPC-1 is about 7% of the total capacity used. Having a lot of cache really helps if a tiny amount of capacity is used.

Ideally, a large enough data set needs to be used to make this realistic. Problem is, this isn't really enforced...

In addition, this was a SINGLE server. There's no true controller failover. Making something able to fail over under high load is one of the hardest things in storage land. A single server with a ton of RAM performing fast is not really hard to do. No special software needed. A vanilla OS plus really fast SSD and lots of RAM is all you need.

Failover implies some sort of nonvolatile write cache mirrored to other nodes, and THAT is what takes a lot of the potential performance away from enterprise arrays. The tradeoff is reliability.

For some instruction on how to interpret SPC-1 numbers, check here:

http://recoverymonkey.org/2015/04/22/netapp-posts-spc-1-top-ten-performance-results-for-its-high-end-systems-tier-1-meets-high-functionality-and-high-performance/

http://recoverymonkey.org/2015/01/27/netapp-posts-top-ten-spc-1-price-performance-results-for-the-new-ef560-all-flash-array/

Ignore the pro-NetApp message if you like, but there's actual math behind all this. You can use the math to do your own comparisons.

But for me, the fact there's no controller failover makes this not really comparable to any other result in the list.

Thx

D

3
0

Our storage reporter has breaking news about Data Fabrics. Chris?

dikrek

You don't NEED to move your data

The beauty of the NetApp solution is that it doesn't force you to move your data.

http://www.netapp.com/us/solutions/cloud/private-storage-cloud/

Using this schema, you could burst into multiple cloud providers for compute, yet your data resides in a colo facility that has fast links to the various cloud providers. No need to move around vast amounts of data. Many people like this approach, and use cloud for what it's really good at - rapidly spinning up VMs.

Conversely, if you DO want to move some data into the hyperscale cloud providers, NetApp lets you keep it in the native usable format without needing to do a recovery first. You could then do things like snapmirror between ONTAP VMs between, say, Azure and AWS, and keep data in their native format WITHOUT needing backup software and WITHOUT needing to do a whole restore...

It's all about providing choices. NetApp currently provides far more choices when it comes to cloud deployments than any other storage vendor. You can go in as deep or as shallow as you like, and if you decide you don't feel comfortable in the end, repatriating the data is an easy process.

In addition, there is ALWAYS lock-in no matter how you engineer something. It's either the storage vendor, or the cloud vendor, or the backup tool vendor, or the application vendor, or the operating system...

Even with totally free tools, the lock-in become the free tools. It's not just a cost challenge.

The trick is in figuring out what level of lock-in is acceptable for your business and whether said lock-in actually helps your business long-term more than it creates challenges.

Thx

D

0
0
dikrek

Re: Not repackaging

Adding the SnapMirror engine to AltaVault (and more) as well, plus the VMs won't be unclustered any more.

And there is ALWAYS lock-in. You just have to choose what you want to be locked into, and whether that serves your business needs best.

Seems to me you missed some of the tidbits in the videos.

For instance, being able to drag in a GUI an AltaVault AWS instance (so, a backup and recovery appliance) into an Azure ONTAP instance (so, a storage OS appliance) and doing seamless recovery - the amount of automation is staggering.

No vendor offers that complete flash to disk to multi cloud and back + automation + backup story.

Thx

D

1
0
dikrek
Megaphone

Not repackaging

This is no repackaging. There's all kinds of new software doing all this behind the scenes, we are implementing the same replication protocol across the entire product line...

See this

https://youtu.be/HgArpF3W73Y?t=3038

and this

https://youtu.be/UluLv_YXx-o

Thx

D

0
0
dikrek

Document link

Folks, Dimitris from NetApp here (recoverymonkey.org).

Here's the link to the paper:

http://www.netapponcloud.com/hubfs/Data-Fabric/datafabric-wp.pdf

As you can see we are thinking big.

Data management and solving business problems is where the action is.

Doesn't hurt that the widgets themselves are awesome, either :)

Thx

D

3
5

NetApp slims down latest controller, beefs up channel efforts

dikrek
Trollface

Back your arguments with facts, otherwise they're pointless

Mr. MityDK,

Dimitris from NetApp here. Sucks for all flash? We win performance PoCs against competitors all the time. Imagine if we didn't suck! :)

We also have had zero (yes zero) SSDs worn out since we started shipping flash in arrays many years ago. Imagine if we didn't suck at flash, how much more reliable we could make them ;)

Data management is also far beyond anything from any vendor (traditional ONTAP strength).

Flexibility - also beyond anybody else's gear by a long shot.

BTW: If you're working for a competitor, it's gentlemanly etiquette to disclose affiliation.

If you're on the customer side - then ask for a demo and see for yourself just how much it "sucks". You might be surprised to learn that not everything you see on the Internet is true, especially if coming from competitors (ignoring for a moment the irony).

Pricing per raw TB means nothing anyway, since that number is without efficiencies factored in. Includes the fastest controller, software and support costs, too. Ask for a quote and see just how "expensive" it really is :)

Thx

D

6
4

Why the USS NetApp is a doomed ship

dikrek

Nothing special about it??

Seriously, nothing special about it?

http://recoverymonkey.org/2015/06/24/netapp-enterprise-grade-flash/

An enterprise storage product that is mature in serving all major protocols (and FYI, retrofitting enterprise NAS on other systems is insanely hard, which is why nobody's done it).

With no downtime for pretty much any operation, including things that would mean major downtime for other systems.

With the best application integration tools on the planet.

The best heterogeneous fabric management system in the world (OCI).

Amazing automation (WFA).

Great performance.

Insane scalability.

Technology that literally keeps the lights on (part of the control chain of many power distribution systems).

Or deployed in life or death situations. By the most paranoid organizations in the world.

That's the storage foundation behind the largest companies in the world.

That's nothing special?

I'd love to see what you consider special. Must really be something.

Thx

D

3
0
dikrek

Re: Beware of Confirmation Bias

Hi Trevor - what I'm trying to understand is why did you even write the article?

What purpose does it serve?

Maybe I don't understand what you do for a living. Perchance you could explain it.

You mention all this research you do - are you an analyst? Have you attended analyst briefings at NetApp? We do them all the time.

Thx

D

1
0
dikrek

Beware of Confirmation Bias

Hi Trevor, Dimitris from NetApp here.

Some friendly advice: look up the term "Confirmation Bias".

It can affect us all - the trick is sensing when it happens.

It's just storage, not religion.

Can it solve most business problems for a variety of enterprise sizes more successfully or not? That's the big question.

Learning about the whole portfolio and how it interoperates might prove especially illuminating.

Thx

D

6
0

SolidFire pulls off gloves for unholy storage ding-dong. Ding-ding!

dikrek

Re: Inline Compression

ONTAP 8.3.1 compression is totally different (very high performance - check http://recoverymonkey.org/2015/06/24/netapp-enterprise-grade-flash/) plus AFF has a ton of extra optimizations (including about 70% usable to raw with ADP, to address another comment).

Tech evolves. Most people leaving a storage company have maybe 6 months before their knowledge becomes obsolete.

Maybe focus on the pertinent question:

Which current technology solves more customer problems reliably?

Thx

D

0
0
dikrek
Happy

The big picture is usually more important

Hi all, Dimitris from NetApp here.

It's really easy to point to features the competition doesn't have. For instance, SolidFire has a weak FC and zero NAS capability.

That's stuff that's pretty hard to implement.

In the grand scheme of things, if the inline data efficiencies in ONTAP plus frequent dedupe (say every 5 minutes) are enough, the comparison becomes a mostly religious one.

Ultimately data reduction is just one way to save costs, and there are many places where costs can be saved.

This might help:

http://recoverymonkey.org/2015/06/24/netapp-enterprise-grade-flash/

Thx

D

3
2

NetApp cackles as cheaper FlashRay lurches out of the door

dikrek

Which is why this isn't just about price!

This isn't just pricing.

We have addressed the objective points you're making:

1. The architecture the customer sees is as complex as they need it to be with 8.3.1. You should ask to see a demo of the GUI and offer constructive criticism after that happens. It is of similar simplicity as much less featured systems.

2. See #1 - managing this way easier than before.

3. OPM v2 is the only monitoring tool most customers will need. And it's free to use. The admin GUI now also has plenty of performance info so most customers won't even need OPM (which does much more than competitor monitoring tools).

4. I beg to differ

5. Sales team arrogance? I've heard horror stories about competitor sales team arrogance. I wonder who all those people are.

6. Many customers moved to purpose built because the enteprise systems were not as easy to use, inexpensive and fast as some startups. Now that we've changed all this... :)

Thx

D

0
1
dikrek

Re: Plenty of new stuff was announced!

Hyperbole much? :)

A slight downturn in NetApp sales can equal the combined sales of all startup flash vendors combined.

Think about that for a second. That's how big we are.

Those same companies that don't have a share price to see fall, don't disclose financials, and are burning through VC funds with alacrity. I have news for you: VC funding isn't unlimited.

We have great products and the best flash solution in the marketplace.

#1 Storage Operating System

#1 Storage for Service Providers

#1 Converged Infrastructure

#1 Storage Provider to the Federal Government

#2 In the storage market

You're obviously right, these are the clear hallmarks of a sinking ship ;)

Thx

D

6
1
dikrek
Happy

Plenty of new stuff was announced!

Hi all, Dimitris from NetApp here.

Lots of stuff is new as of June 23rd, check here: http://recoverymonkey.org/2015/06/24/netapp-enterprise-grade-flash/

We understand this upsets competitors and people stuck in old thinking modes, but such is the way the cookie crumbles.

No, we will not apologize for causing angst to competitors ;)

Thx

D

6
3

NetApp's customers resisting Clustered ONTAP transition

dikrek

Re: Here's the bit I don't quite follow ..

you honestly think that's how people buy stuff?

0
0
dikrek

Re: The limitations you mention are either not there or rapidly going away

With 8.3 7MTT is free to use by customers. You don't NEED migration services but maybe your account team felt it would help. Point them my way if you want. Actually feel free to contact me. Plus I can't discuss any roadmap items in this type of forum.

EMC had to migrate XtremIO customers for free because they had TWO destructive upgrades in 1 year for a new product. Plus they can do it since there's a tiny number of XtremIO deployed vs ONTAP, VMAX or VNX. Not a lot of resources needed to do it.

They won't migrate the huge numbers of VMAX or VNX customers for free...

But there are some new developments and we will be making migrations a lot less expensive going forward.

8.3 has more performance stats in the GUI than previous releases and future releases will bring even more. In the meantime you can use OPM (free).

I double-checked and the Exchange integration requiring RDMs or physical LUNs has to do with how Microsoft does Exchange VSS integration vs how SQL does it (VDS). It does require Microsoft to make some changes.

At the moment you can't flip a switch and convert a FAS to a MetroCluster - MetroCluster is far more than just sync replication and needs some extra hardware in order to be set up.

About perfstats - that tool goes far deeper than any normal performance monitoring program. For general performance stuff it's not really needed but I have to admit our support wants one for every case it seems since it captures so much stuff. I can talk to you about that once you reach out.

Thx

D

0
0
dikrek
Stop

The limitations you mention are either not there or rapidly going away

Hi all, Dimitris from NetApp here (recoverymonkey.org).

Technology evolves at a rapid clip - and ONTAP more rapidly than most think.

Think about it - we have needed a disruptive upgrade only TWICE since 1992 (TradVols->FlexVols, and 7mode->cDOT). Yet we allow full hardware re-use and don't force customers to buy all new stuff. It's not like a 7-mode system can't be re-initialized and join a cDOT cluster... :)

Other vendors force major migrations and hardware swaps upon customers every 3-5 years and XtremIO needed TWO destructive upgrades the past year alone. And most startups are too new - will their architecture stand the test of time so much that they need 2 disruptive upgrades or less in over 20 years? Really?

Puts things in perspective a bit.

To adress your points:

1. 7MTT is absolutely available for use by customers now, no PS or transition team needed.

2. E-Series is indeed not designed to make heavy use of the fancy capabilities - it's more an easy, fast, reliable I/O engine.

3. Look at the built-in GUI in ONTAP 8.3. Stats are there. We also DO provide performance stats via AutoSupport.

4. Exchange SnapManager needing RDMs: This is a Microsoft-imposed limitation. For SQL for instance we can use SMSQL even if the SQL VM is running on an NFS data store... :)

For the rest of the "limitations" - there are best practices and then there are true limitations. Don't confuse the two please.

ONTAP is still the most flexible storage OS by far. Does it do EVERYTHING? Nothing does. But it deals with more data center problems than any other storage OS extant today. Simple fact.

Thx

D

0
0

Gartner: Dell nowhere to be seen as storage SSD sales go flat

dikrek
Stop

Re: Figures Don't Add up

Hi all, Dimitris from NetApp (http://recoverymonkey.org).

This is an accounting issue. How does one track AFA sales? For Pure it's easy, everything is AFA.

From EMC, XtremIO is always an AFA. Easy to count.

For NetApp, when this report was run only EF was an AFA, and AFF (All-Flash FAS) wasn't counted since up till now there wasn't a strict all-flash FAS model that actually doesn't work with HDD. The reality is NetApp sells huge amounts of SSD... as do many other vendors.

For example, for HDS the story is similar, I bet they sell a shedload of flash yet aren't high on the chart.

Gotta love it.

Thx

D

1
0

NetApp CTO Jay Kidd resigns and retires from the industry

dikrek
Stop

Dream on

Hi all, Dimitris from NetApp.

You can of course choose to believe whatever you see anonymously posted on the Internet.

Making claims is easy. "XtremIO is shelved!" "Pure is going out of business!"

Substantiating the claims is far harder.

Thx

D

0
0

Page:

Forums