User topics

Article topics

Log in Sign up

IBM demonstrates dedication to deduplication replication

IBM is adding replication to its ProtecTIER deduplication product, which it acquired by buying Diligent in April 2008. ProtecTIER products will be able to replicate deduplicated data to a remote site and thus reduce any needs customers might have to ship tapes to a remote site for disaster recovery. Transmitting deduplicated …

COMMENTS

House rules Send corrections

This topic is closed for new posts.

Thursday 30th July 2009 12:44 GMT Kebabbert

ZFS offers deduplication for free

soon. It's on the way. :o)

0 0
Thursday 30th July 2009 13:42 GMT Michael C

Dedupe new data?

Um, with replication, you're primarily sending newly created blocks... There really should not be a large amount of data to dedupe in NEW data! Yes, there's some, but mostly that's from people flinging files around in e-mail and copying them to multiple personal folders scattered all around the companies serevrs. That is handled by simply configuring workspaces and restricting internal attachment direct delivery within the mail system... That's not only free, but reduces the storage burden on the mail system and throughput of those servers, saving money too.

We have about 3,000 servers here. We're currently testing 3 different SAN vendor's technologies that support storage virtualization, dedupe, and replication. Thus far, we've discovered that thanks to our architecture, excluding OS and application volumes and considdering only data, we can only save about 6% using deduplication. The cost of the dedupe licensins in each case exceeds the cost of even an additional 10% storage.

since we have a vast system imaging and deployment methodology, we don't back up or store server boot and app drives on SAN, only data. Our servers are not "recovered", they're simply re-imaged if they crash, on a new hardware or VM box, so making OS backups is unimportant. Also, a large number of our systems are non-windows, and virtualized in shared binary VM machines (the ultimate form of dedupe).

0 0
Thursday 30th July 2009 13:46 GMT Craig McAllister

@Kebabbert

Question: Does ZFS offer memory-based inline deduplication for free at >900MB/sec? Is it hash based?

C

0 0
Friday 31st July 2009 10:29 GMT Kebabbert

Craig McAllister

I have no clue. ZFS dedup has been recently announced on a talk. That's all Ive heard. I think that dedup has been integrated into the ZFS code now.

But I do know that the more drives you use, the higher the bandwidth. If you use 46 SATA 7200 rpm drives, you reach 2-3 GB/sec read speeds. That is >900MB/sec. But I dont know how dedup will affect that. I guess if you have a fast enough CPU it should be no problem, as ZFS uses no hardware raid controller cards. Everything is done on the CPU.

0 0
Monday 3rd August 2009 23:48 GMT Craig McAllister

@Kebabbert

That was my point, really.

Diligent (Protectier) does >900MB/sec dedupe today, off two boxes of commodity hardware clustered together. It scales hugely (1PB, off the shelf) because it's less limited by memory scaling problems than "traditional" (if there is such a thing) hash-based dedupe algorithms are.

Yes, ZFS will do dedupe for free (if you consider storage I/O, processor and RAM to be free). Diligent isn't free, but it's more effective than the mooted ZFS dedupe will be anyhow.

Forgive the slightly combatative way of asking the question, I've just finished writing a whitepaper on this stuff, and comparing hash-based dedupe to diligent's fingerprinting approach is sort of like comparing the ark to the Ark royal.

Cheers

C

0 0

This topic is closed for new posts.

The Register Biting the hand that feeds IT

About Us

Our Websites

Your Privacy

Situation Publishing

Copyright. All rights reserved © 1998–2024