back to article Hypervisor kid Jeff Ready: Converged to the core, and NO VMware

Scale Computing CEO Jeff Ready reckons its hyperconverged HC3 software is better than anyone else's because it's integrated into the hypervisor's core and not just another VM. Ready briefed El Reg at a March meeting in London. Scale's HC3 product does not use VMware, being centred on KVM instead. Because storage access is …

Anonymous Coward

Seriously? Did he really said that? With a straight face?

"Dedupe is looked upon as being unnecessary, except in specific use cases like VDI, as drives are getting so large you won't need to dedupe"

OK... I'll just park this amazing vision right next to Thomas Watson famous "I think there is a world market for maybe five computers."

Anonymous Coward

Re: Seriously? Did he really said that? With a straight face?

Was talking to a Scale guy, he mentioned that they had shown off their Dedupe stuff at a show just a couple weeks ago - must be an older quote from Ready.

Silver badge

Re: Seriously? Did he really said that? With a straight face?

"Except for VDI" ... I think the point that was made is that dedupe is not the *default* case, but rather it should be applied only where it *truly* has value. In many cases you may have a bunch of nodes in virtualization stack that have logical relationships but a lack of data commonality - dedupe in these cases can be a latency issue. In VDI, you have a *spectacular* value in applying dedupe. In a cluster of sharepoint servers you *may* have value in dedupe and you may not. In an application mid tier, again, it very much depends on the circumstances, stateless midtier app servers will have massive storage, and as long as one puts the logs somewhere *else* your value in dedupe is high. Applying dedupe to back end DB instances will have almost zero real value, and can in fact be of negative value.

<sorry - guess what I was having to explain to the Management folks recently>

Silver badge

Re: Seriously? Did he really said that? With a straight face?

Ok... because there are bad implementations of dedupe out there (lots of them... NetApp being among the worst I've seen), there will always be comments like this.

Let's talk a little about block storage. There are many different levels of lookup for blocks in a storage subsystem. If you look at a tradtional VMware case, there are at least 6 translations, possibly up to 20 for each block access across a network. Adding FibreChannel in-between aggravates the issue quite badly. It adds a lot of latency based on it's 1978 era design (this is no an exaggeration, the SCSI protocol is from 1978). There are many more problems which come into play as well.

Every block oriented storage system which supports any form of resiliency through replication of any sort (which is not an option anymore) has to perform hashing on every single block received. Those hashes must be stored in a database for data protection. For 512-4096 byte blocks, chances are a CRC-32 is suitable for data protection, and for deduplication with a "lazy write cache" it's is also suitable. However, in the case of NetApp for example which is severely broken by design, everything is immediate and there's no special storage for lazy or scheduled dedup.

In a proper dedup system, blocks which have two or more references on a write operation (even if hash matches) will decrease their reference count and a new block will be written to high performance storage (NVMe for example) with a single reference. If there was only one reference, then the block is altered in place and the hash is updated.

Then dedup will run "off-peak" meaning (for example) that if the CPU is under 70%, then the new blocks stored on disk will be compared 1:1 with other blocks with matching hashes and references will be updated only a single copy of the data itself will be maintained. In addition, during this phase, it is possible to lazy compress blocks and migrate to cold-storage (even off-site) or heaven forbid FC SAN storage blocks which are going stale.

Dedup should have absolutely ZERO impact on performance when implemented by engineers who actually have half a brain.

The disadvantage to the system described above is that dedup won't be sexy at trade shows since it might take minutes, hours or more to see the return from the dedup operation.

As for databases, if you're running mainstream SAN (EMC, Hitatchi, 3Par, NetApp), you're absolutely right. You should avoid dedup as much as possible. None of the those companies currently employ the "real brains behind their storage" anymore and they haven't had decent algorithm designers on staff in years. They take a system which works and layer shit upon shit upon shit to sell them. There will be problems using any GOOD storage technologies on those systems.

For database and most modern instance, you should make a move away from block storage oriented systems and focus instead on file servers with proper networking involved. In this case, I would recommend a Gluster cluster (even if you have to run it as VMs) with pNFS or Hyper-V with Windows Storage Spaces Direct. These days, most of the problems with latency and performance are related to forcing too may translations between guest VM and the physical disk. There's also the disgusting SCSI command queuing illness which is something which orders file read and write operations impressively stupidly since NCQ at each point it's processed has no idea what the block structure of the actual disk is. pNFS and SMBv3 are far better suited for modern VM storage than FC and iSCSI can ever be.

That said, there are some scale-out iSCSI solutions which aren't absolutely awful. But scale-out is technically impossible to achieve over FC or NVMe.

P.S. - Dedup in my experience (I write file systems and design hard drive controllers for personal entertainment) shows consistently higher performance and lower latency than the alternative because of the simplicity involved in caching.

P.P.S. - I've been experimenting with technology which is better than dedup as it would instrument guest VMs with a block cache that eliminated all Zero-Block reads and writes at the guest. It improves storage performance more than most other methods... sadly, VMware closes their APIs for storage development, because of this, I have to depend on VMware thin-volumes or FC in-between to implement that technology.

P.P.P.S. - I simply don't see this company doing anything special other than trying to define a new buzz term which is nothing new. Implementing code into the KVM kernel is the same as Microsoft implementing SMB3 into Hyper-V, it's just old hat.

Anonymous Coward

After what these guys did to their storage customers...

Even if their HCI offering was the "bee's knees" I wouldn't give them a second look!

Their storage offering was abysmal at best, and to top that off they decided they didn't want to do/support it anymore so they told their customers..."sorry, we can't help you" and then came back to life as HC3/HCI! What kind of crap is that!!

"Serial Entrepreneur" seems about right...I don't like this game, let's play another!

...Oh, and "..dedupe is looked upon as unnecessary..." my backside! Says this for two reasons:

1. Dedupe on HCI platforms is HARD, and it is SLOW - ask our buddies at Nutanix!

2. Because it is HARD and it is SLOW, Scale can't figure out how to do it so they have abandoned the effort!

Just my $.02

Silver badge

Re: After what these guys did to their storage customers...

Dedupe on HCI is easy if you're not using VMware as they don't properly support pNFS, it does a bastardized form of it called multipathing.

The solution to this to to sell your soul to VMware, get access to the NDDK and implement a native storage driver which can implement pNFS on its own. There's absolutely no value in doing this and no one should ever bother trying.

There's the alternative which is to attempt to get iSCSI up and running in a scale-out environment. Due to limitations in vSwitch, this isn't an option since multicast iSCSI isn't supported in VMware's initiator and anycast isn't profitable in this case.

FC is out if for no other reason but FC is storage for people who still need mommy to wipe their bottoms for them. FC is so simple stupid, a monkey can run an FC SAN (until they can't... but consultants are cheap right?) and what makes it so simple stupid is that FC doesn't support scale-out AT ALL, though MPIO could scale all the way to two controllers.

So, then there's the question of value. Where's the value in dedup on a VMware HCI platform? That's a tricky one since due to the nature of VMware's broken connectivity options for storage, you can't scale out the system connectivity to begin with. You also can't extend the vmware kernel to support it because even if you have access to the NDDK, there's no one who actually knows it well enough to program with it and if you look at VMware's open source code for their NVMe driver on github, you'll see that you probably don't want to use their code as starting point. It's pretty good... kinda... but I'm tempted to write a few exploits for the rewards now.

Oh, then there's the insane cost and license problem behind the VAAI NAS SDK from VMware. I almost choked when they sent me a mail saying "$5000 and we basically can tell you what to do with your code"... for a 13 page document (guessing the size). So, you can't even properly support NFS to begin with. And no, I would never ever ever agree to the terms of that contract they sent me and there's less chance I would consider paying $5000 for a document that should not even be required.

So, back to dedup... you can dedup... in HCI... no problem! The problem is, how can you possibly get VMware to actually use the dedup and replicated data?

Then there's Windows Server 2016 which ships with scale-out storage, networking and compute all on one disc and all designed from the ground up for.... scale-out.

There's OpenStack which works absolutely awesome on Gluster with scaleout and networking.

So, what you're talking about "dedup on HCI is hard and slow." this is absolutely not true. dedup and scale-out on VMware is damn near impossible. But it's a stock component of all other systems and see a post I made earlier about slow. Slow is not a requirement. It just takes companies with real storage programmers not just hacks that slap shit together using config files.


POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon


Biting the hand that feeds IT © 1998–2018