back to article NetApp adds in-line dedupe to all-flash FAS arrays

NetApp has updated its Clustered Data ONTAP OS to support in-line deduplication and 3.8TB SSDs. Inline deduplication means data is deduplicated as it lands on the array instead of being initially stored in its full form and then having later, post-process deduplication applied to it. In-line deduplication is more space- …

  1. Anonymous Coward
    Anonymous Coward

    CPU resources?

    "In-line deduplication is more space-efficient and requires CPU resources for dedupe to be supplied as data lands on the array."

    So what performance impact will this have on the array?

  2. Paul Hargreaves
    FAIL

    Listening to the Tech ONTAP podcast it's something that can still be turned off, so it's still an afterthought.

    I'll bet the release notes / documentation for it give a whole long list of reasons to stay away.

    Done properly, dedupe speeds things up, not slows things down since you get more out of each deduped cache, and you also avoid I/Os since you can discard duplicates.

    1. Anonymous Coward
      Anonymous Coward

      Isn't that the truth! Do yourselves a favor and read the release notes because some of us have been burned one too many times as sales <> deployment reality. Our experience has been there are too many gotchas and last minute surprises with this system that automagically tend to surface just before implementation. it has been a frustrating battle for us the last 14 mos.

    2. Anonymous Coward
      Anonymous Coward

      Wat? Any "feature" in the IO path has the potential to slow things down. If you know your data is non-dedupable why leave on inline dedupe and waste CPU cycles generating and comparing hashes? At least you have the option to turn it off if you so choose unlike most all flash arrays.

    3. danisaacs

      Hi Paul, miss you!

      Not an afterthought, a primary objective. And while inline does reduce disk IO, it also increases CPU cycles. This is why Pure, for example, will disable it under heavy load. To be done "properly" you would have to have a more or less fixed metadata storage to capacity ratio. This means your DRAM would be the limiting factor in your ability to scale up capacity, which Dell/EMC face with Xbricks.

      In its current iteration, Data ONTAP's inline dedupe is recommended to be used in conjunction with post-process to provide optimal savings. Inline is most helpful for a some VDI use cases that were an issue with post-process only. Primarily OS patching for persistent desktops.

      The guidelines are simple: Don't use it for databases (they don't dedupe), use it for VDI.

      Cheers, mate!

      1. Paul Hargreaves

        Hi Dan, miss you (and the rest of the gang) too :-)

        http://www.netapp.com/us/technology/storage-efficiency/feature-story-compression.aspx most workloads benefit from dedupe. Not everything, of course, but a significant portion.

        Any time you can offload from the back end or save space is a benefit.

        I agree, most implementations of dedupe are pants. If a feature is worth using then it should always be on. Being under load isn't an excuse - since systems properly purchased should be as busy as possible.

        In an ideal world, everything would be in-line and the idea of a 'post process' for anything would be eliminated since doing work post process just means you're doing the same work twice, along with all the other implementation joys such as having blocks locked in place by other features.

        As you rightly note, memory can be limiting factor if all metadata for hashes is in main memory. Fortunately, that is solvable as well, either using SSDs to hold that metadata, or using scale out memory solutions. I can understand why not SSDs if you're on a race towards 0, but most workloads don't need it...

  3. Jim 59

    Dedupe

    Dedupe is still alive? It never seems to have lived up to its promise somehow, not since those heady days of 2008, before Data Domain was gobbled up by EMC.

    Dedupe is a Nirvana, in theory. By now it was supposed to be everywhere: even in ZFS and Linux. One of the problems is that encrypted data can't be deduped, and encryption is becoming the norm in some areas. Corporate desktops (or at least laptops) are now usually encrypted. So is data held in the "cloud".

    What happens to a 100TB array if somebody dumps 20TB of encrypted data on there? Do the inline ingesters just dumbly thrash themselves to death trying to dedupe/undedpue it, or do they know what they are dealing with and somehow skip those blocks? Compressed data almost as bad, eg. nearly all media files.

    It seems that non-compressed, non-encrypted data will soon be restricted, perhaps, to internal corporate office servers. And who wants to buy a fancy DD SSD array system for that mundane stuff?

    1. Paul Hargreaves

      Re: Dedupe

      If encrypted or pre-compressed (and unique) data hits a dedupe device of any type, it'll not dedupe.

      Backups of said data will still dedupe well.

      If it's properly in-line, no thrashing, they would (sensibly) generate a hash of the data and go 'new' (store), 'new' (store), 'new' (store) for each block of data.

      But, reality check. You encrypt the (physical) disks, not the apps. I've yet to come across any customer who runs server VMs with guest-level encryption. I'm sure there must be some people out there who per-VM encryption and to those, they'll already have their restricted product lists that they'll be used to working with.

      1. Jim 59

        Re: Dedupe

        Backups of said data will still dedupe well.

        Don't use a dedupe array as a backup target, tempting as it will seem. Yes, you would save tons of space. But for backups, you actually *want* to have multiple physical copies of the data, Not one physical copy and many logical copies (which is what deduped data is). If those physical blocks die, you could lose not just a single backup, but all generations of that backup within the deduped domain.

        I guess another reason that dedupe isn't widespread is that storage is just so cheap now. Even for primary enterprise storage, what you pay for is the speed rather than the capacity. "Big data" might be a better candidate.

        1. Hapkido

          Re: Dedupe

          "...If those physical blocks die"

          Maybe still a case to be had for tape backup?

          Even better hey, as you said "multiple physical copies of the data", why not have those physical copies on different media. Disk only backups (inc traditional backup, CDP style, replication etc) is not the only option, expensive and often unnecessary, having one full (backup) copy + logical copies on disk is then maybe a reasonable option, with additional full copies found elsewhere.

          While on the subject, why keep old backup data on disk at all?

        2. SPGoetze

          Re: Dedupe

          What you'd usually do on a NetApp system is, leave dedupe on, keep your snapshots, and *snapmirror* that to a different physical site. That way you have another physical set of copies of your data, not on the same spindles, but in another location, where you'd also be protected against site disasters (e.g. prolonged power outages, floods, fire, ...)

    2. SPGoetze

      Re: Dedupe

      Just FYI, we have 84% dedupe rate on our NFS datastores.

      >14TB logical stored on ~3.5TB physical storage.

      Dedupe is very much alive with us...

  4. Anonymous Coward
    Anonymous Coward

    Doesn't really matter if it took them ages to offer inline dedupe...

    NetApp is just happy to be in the news...

    And that Snap Protect vs Intelli Snap nonsense is a waste of news space, too.

    The underlying technology has always been Intelli Snap. It's just that Netapp marketing like to confuse customers. Once they realise that customers aren't that dumb they revert to the original name.

    Soon we'll see "NetApp Data Cluster Ontap Mode FAS".

    1. SPGoetze

      They were the first to offer dedupe... also for a long time the only ones to offer it for primary storage.

      Who cares, if it wasn't inline!

      That's only important if you want to reduce write amplification on flash systems.

      And they did use 'always on deduplication' before, if you check the relevant Tech Reports (e.g. about Horizon View on a AFF system). Yes, there was some more write amplification, but again, who cares, since it was covered by the NetApp warranty anyway (and they didn't yet have a single SSD fail because of aging either).

  5. Jim 59

    In the home

    I'd love a dedupe home NAS, so I can save space storing all those mp4 files - oh wait. No. Well then my massive FLAC archive, surely...no, can't dedupe that either. Well what about that massive archive of ISO images, I can, no, hang on,er... mp3, flv, encrypted backups, no, no, and no... jpegs no, gif no...

  6. louisPTC

    Yes dedupe is alive and well - in fact, it is flourishing. In the flash array market, all of the top 5 vendors offer it in one form or another. A few vendors, like HDS, even offer effective primary storage deduplication for disk and hybrid systems on their NAS units. Don't confuse what Data Domain has (backup-optimized dedupe) with what is shipping in primary flash arrays (random-access optimized dedupe). They are completely different technologies. Backup dedupe is optimized for sequential IO, specialized backup formats, and high rates of duplicate data. Primary dedupe operates on blocks of a fixed size and is optimized for random access in mixed workloads. Under these workloads, primary dedupe products can see zero performance impact or even performance improvement. It all depends on the underlying hardware and even more importantly on the dedupe software capabilities.

  7. Anonymous Coward
    Anonymous Coward

    Maybe this De-Dupe thing will be at the door of the NetApp offices? Re-org drum roll pls.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like