4 posts • joined 18 Jun 2010
Hot, hot space!
If this indeed has any truth, it's another validation for a hot, hot space!
Answers to Math Questions
Here's a reply that answers all the math questions asked above:
Question: Albireo only requires 0.1 bytes of RAM per block of data indexed. So that would be two fifths of a bit then? Or are we talking Qubits here.
Permabit Answer: Albireo requires, on average, 0.1 bytes of RAM per block being indexed because Albireo uses a hybrid memory/disk index. Only a small portion of the index is retained in memory at any given time, however Albireo maintains extremely high performance because more than 99% of the time a deduplication request can be fulfilled without having to retrieve any portion of the disk index.
Question: Is Albireo using black magic?
The math is giving me a headache. Peglar says 4 trillion items of metadata are needed to track the 4K blocks in 4PB of data; I think that should be 1 trillion, but it doesn't change his overall point, especially since he also said you'd only need to use 8 bytes of data per 4K block for indexing, while I think most systems would need more than 8 bytes.
To wit, Permabit says "in a system such as Albireo, the percentage of overhead is a bit over 1 per cent of the disk for 4K blocks" -- which means they're using 40+ bytes per 4K block.
However, in the next breath, Permabit says "Albireo only requires 0.1 bytes of RAM per block of data indexed" -- and 0.1 is less than 40.
I'm assuming they mean they store 40+ bytes per block *on disk* while using an in-memory index that takes only 0.1 bytes per block.
Problem I have with that is 0.1 bytes is not even a bit. I don't even understand the meaning of "0.1 bytes" and I don't see how it's possible to index *anything* using less than a bit. The wizards at Permabit *seem* to be dabbling in black magic.
Permabit Answer: Albireo only requires, on average, 0.1 bytes per block of data indexed because Albireo uses a hybrid memory/disk index. For the portions of the index that are currently in memory, Albireo uses around 4 bytes per entry, however only a small portion of the index is required to be in memory during normal operation. Based on sophisticated data modelling that Permabit has developed over the past ten years, Albireo is able to maintain extremely high performance because more than 99% of the time a deduplication request can be fulfilled without having to retrieve any portion of the disk index.
His retort doesn't make permabit anymore viable in a big data environment. For 4PB of data, they need 100GB of ram PER NODE, just for storing the dedup hash tables? What kind of crack is he smoking that he thinks that's a scalable solution? It might be slightly more scalable than Isilon believes it is, but it's still nowhere near the realm of viable.
Permabit Answer: No, Albireo would only require 100 GB across the entire grid. Deduplication does not require that the entire index be stored local to each node; this is another misconception that Peglar presented in his original interview.
I'm not sure where you are getting your information about Permabit, but you're clearly misinformed. Permabit has been developing deduplication technology since 2000 and in fact we have many patents in this area. Our products have been shipping with deduplication (and compression) since 2004 and are implemented and deployed in Fortune 1000 companies.
Our OEM partners recognized the power, scalability, and performance of our deduplication technology and asked us to create Albireo so they could implement it into their own product portfolios too. They wanted (and we delivered) a deduplication technology that can completely integrate into their storage software and work seamlessly with their existing features such as thin provisioning, snapshotting, and yes, their own embedded compression technology. What they asked us for (and what industry analysts have confirmed) was an embedded product that could deliver:
1) support for any storage: block, file, or unified platforms
2) support petabyte scalability
3) fast performance with zero impact to users access to data
4) a solution that was completely out of the read path (no readers required)
5) a solution that gave the flexibility of deploying as an inline, post-process, or parallel solution (or any combination of the three)
VP Marketing, Permabit
Permabit Albireo = Primary Storage
Hi Chris - Point of clarification. Albireo is targeted at primary storage and this announcement regarding VTL has no bearing on a potential relationship between Permabit and HDS. All primary storage vendors are focused on improving their competitive position and maintaining margins. That is what Albireo delivers to primary storage vendors.