back to article Does anyone really want to embed dedupe code?

Which storage OEMs will embed Ocarina or Permabit deduplication code in their products? Deduplication, or data reduction as it's becoming known, involves scanning data, whether in block or file, and replacing repeated block groups or chunks with pointers. Typically this is done with backed up file data, where there is lots of …

COMMENTS

This topic is closed for new posts.
  1. David Halko
    Go

    Oracle/Sun Solaris ZFS is the Gold Standard

    Chris Mellor asks, "Which storage OEMs will embed Ocarina or Permabit deduplication code in their products?"

    The answer is... who knows!

    De-Dup and Compression source code is free, available on the internet, and no one in their right mind (unless on a religious crusade or on a severe catch-up plan on their own proprietary code) would develop a new system to leverage for an embedded storage product when a half-decade of open source development has already achieved so much.

    Oracle/Sun Solaris ZFS is the Gold Standard now for embedded systems.

    1. Matt Bryant Silver badge
      Happy

      RE: Oracle/Sun Solaris ZFS is the Gold Standard

      I don't know, but maybe actually having a secure future and not being subject to a NetApp lawsuit might have some bearing in why the rest of the World is ignoring ZFS.....

  2. Anton Ivanov
    Flame

    Data reduction is not a matter of ethos, it is a matter of performance

    As far as VM storage is concerned de-duping it leads to drastically reducing the effective IOPS load. The "common part of the VM" + "differences" can actually fit into tier-1 storage or even cache. As a result the performance is much greater compared to a non-deduped install.

  3. Matt Bryant Silver badge
    Pirate

    Microsoft?

    So, what is the Great Satan of Computing doing about de-dupe? It would knock a big hole in NetApp's Windows-centric market if M$ came out with a proper de-dupe capability in the next gen of Windows Server. Who'd bother with clever second-gen de-dupe if M$ just bundles it in with the OS? After all, the majority of today's de-dupe, especially in backups, is in Windows servers and desktop files. That would leave the rest fighting over the UNIX/Linux and mainframe market.

  4. random_graph

    Ocarina & Permabit

    Great coverage on the respective issues and possible outcomes Chris.

    Permabit and Ocarina are both thinking along the right lines; dedupe is a killer technology for primary storage, and it will increasingly be an embedded feature.

    Permabit's comments about Ocarina seem a bit out of place though, "...but our technology is mature and being delivered..."

    - Permabit was formed 9 years ago to make a better disk storage system. Dedupe was an feature added onto their object storage system <1 year ago...Now in the latest redo, they're throwing out the product to try to create success on the feature. Ocarina on the other hand started life focused 100% on data reduction for primary storage.

    - Permabit's dedupe is <1 year old, and it's never shipped integrated with anything other than Permabit's storage box. Ocarina has been shipping for 2 years, and *every* delivery was integrated with someone else's storage system (Ocarina doesn't store the data).

    - Ocarina delivers dedupe *and* compression (a big portfolio of algorithms at that), with proven results on over 1500 file types, including pre-compressed data and specialized data sets in different industries. Furthermore, the end-to-end strategy that Ocarina is talking about is really a next-generation architecture. Permabit's feature-now-product has a long ways to go in technical sophistication to catch up to Ocarina.

    1. Super Fast Jellyfish

      Maybe I've missed something...

      ... but if, as a home user, I copy from disc C to D (even if they are virtual) to protect from bad sectors, is the de-dupe going to point to the original data and lose the effectiveness of a disc backup?

      1. Jim Bob

        "Maybe I've missed something..."

        Dedupe isn't designed for operation in a non-redundant environment. It relies heavily on there not being any bad blocks so is only of use on RAIDed storage arrays that perform scrubbing to ensure bad blocks are identified before checksums are compromised.

    2. Chris Mellor 1

      From Permabit's Marketing VP

      (Sent to and posted by me, Chris):-

      random_graph

      I'm not sure where you are getting your information about Permabit, but you're clearly misinformed. Permabit has been developing deduplication technology since 2000 and in fact we have many patents in this area. Our products have been shipping with deduplication (and compression) since 2004 and are implemented and deployed in Fortune 1000 companies.

      Our OEM partners recognized the power, scalability, and performance of our deduplication technology and asked us to create Albireo so they could implement it into their own product portfolios too. They wanted (and we delivered) a deduplication technology that can completely integrate into their storage software and work seamlessly with their existing features such as thin provisioning, snapshotting, and yes, their own embedded compression technology. What they asked us for (and what industry analysts have confirmed) was an embedded product that could deliver:

      1) support for any storage: block, file, or unified platforms

      2) support petabyte scalability

      3) fast performance with zero impact to users access to data

      4) a solution that was completely out of the read path (no readers required)

      5) a solution that gave the flexibility of deploying as an inline, post-process, or parallel solution (or any combination of the three)

      Regards,

      Mike Ivanov

      VP Marketing, Permabit

      ---------------------------------------------------------

    3. mivanov

      Permabit Facts

      random_graph:

      I'm not sure where you are getting your information about Permabit, but you're clearly misinformed. Permabit has been developing deduplication technology since 2000 and in fact we have many patents in this area. Our products have been shipping with deduplication (and compression) since 2004 and are implemented and deployed in Fortune 1000 companies.

      Our OEM partners recognized the power, scalability, and performance of our deduplication technology and asked us to create Albireo so they could implement it into their own product portfolios too. They wanted (and we delivered) a deduplication technology that can completely integrate into their storage software and work seamlessly with their existing features such as thin provisioning, snapshotting, and yes, their own embedded compression technology. What they asked us for (and what industry analysts have confirmed) was an embedded product that could deliver:

      1) support for any storage: block, file, or unified platforms

      2) support petabyte scalability

      3) fast performance with zero impact to users access to data

      4) a solution that was completely out of the read path (no readers required)

      5) a solution that gave the flexibility of deploying as an inline, post-process, or parallel solution (or any combination of the three)

      Regards,

      Mike Ivanov

      VP Marketing, Permabit

  5. Steven Hollis

    Old Tech Rebadged

    Dedupe disk compression what ever you want to call it is old school.

    Back in the day MS DOS had Drivespace and Novell had stacker.

    All compression can ever be is replacing a common string with a marker to an index of common strings.

  6. David Halko
    Go

    RE: Maybe I've missed something - ZFS can protect against it

    Super Fast Jellyfish asks, "I copy from disc C to D (even if they are virtual) to protect from bad sectors, is the de-dupe going to point to the original data and lose the effectiveness of a disc backup?"

    If your desire is to protect against bit-rot, copying data between volumes is no longer needed under ZFS, there is a "copies" clause to perform this action automatically.

    If you want to protect from bad sectors with ZFS when running dedup on a single drive, you can run with "copies" property, and always make multiple copies of your disk blocks. With ZFS checksumming your data, it will detect the bad block and fix it dynamically upon discovery (or upon the next scheduled scrub), instead of subjecting the user and/or business to "bit rot" data loss, as other operating systems do.

    If you have multiple disks and running with redundant physical disk setup, the copies property is not required to protect/correct from bit rot, but rather the spare disk[s] will manage it under ZFS.

    The next item, you can apply compression, to mitigate the expanded disk space required with "copies" property or redundancy at the physical disk layer. This also reduces performance by reducing the quantity of data being read off the disk when there is ample CPU capacity.

    What is really nice, dedup sits on top of this entire infrastructure, to increase performance, and reduce disk usage. On virtual servers, this will enable more secure data, faster performance, and massive disk space reduction requirements.

    If there is a problem with user error, with unlimited rolling snapshots, you can even recover without a backup if the physical media has not been compromised..

    If there is an outstanding legal question with NetApp with ZFS, it will also apply to non-production quality Linux BTRFS as well. Oracle, the sponsor of mature ZFS & immature BTRFS, is big enough to purchase NetApp, anyway.

    ZFS is the Gold Standard in storage and reliability.

    I am uncertain how Ocarina & Permabit deal with bit-rot on the disks or whether de-dup on non-ZFS infrastructures with (or without) multiple disks can be protected against.

This topic is closed for new posts.

Other stories you might like