Hardware-accelerated file storage supplier BlueArc is to sell Ocarina deduplication hardware integrated with its Titan 3000 product. BlueArc's Titan 3000 is a network-attached storage (NAS) product offering very fast access, up to 4PB of capacity, and tiered storage embracing fast Fibre Channel drives, bulk storage SATA drives, …
Reducing size of AV?
Ocarina claims that this is a lossless format, but I could find no technical information doing a quick search. They simply say that they "optimize" based on the type of content, with "initial space savings range from 40% for complex image files to well over 70% for common office file mixes." Of course, they don't qualify that by saying how much data (and thus how much de-dupe) was used to generate those numbers, or any other details for that matter.
They do specifically mention that they break a file down into the file level, object level, and chunk level, and then "optimize" and "remove redundant information" from each of those levels. Without seeing any details or any tests, it does prompt the question -- is the reconstituted file an exact duplicate of its original form, or is it a restructured file which may *appear* to be the same (for example, the same internal components, but rearranged within the file without affecting the overall data)? The latter may not be bad for some people, but I would most definitely want an exact duplicate of what I put into the system.
And I have to say, I still don't understand this big push for de-duplication. I understand the desire to reduce the size of data sets, but de-dupe involves significant processing power, and it massively increases your risk. If you're running a standard, non-duped system and you lose a file, you've lost one file. If you're running an "optimized" duped system, and you lose the file that includes a portion of data deduped from 100 files, you've just lost 100 files. Yes, we all know about proper backups, RAID, etc. But de-dupe seems like too much of a risk to me.
How is this supposed to work?
I thought the reason other dedupe offerings don't offer much benefit for media files (JPG, MPG, etc) is because these formats don't usually contain any redundant information. How can you dedupe data that already contains little or no duplication?
Of course, any dedupe product should be able to optimise multiple copies of the same media file (assuming they are verbatim copies) but if the files are different, and due to their already optimised (compressed, perhaps lossy) formats, don't contain any duplication - how can this work? I'm not saying it doesn't work, just that I'd like some details on how?
And do you really want to dedupe media files anyway? They're the very definition of "streaming media", and are usually written and read sequentially (ie. high speed). Surely any dedupe process will, by definition, reorder the underlying block structure and "de-sequentialise" the block layout?
ROFL - I just watched the "tech demo". I'm still in shock. Goodluck with this one... :)
PS/ Chris C - I think you've misunderstood how dedupe works. There should be no increased risk due to multiple files sharing the same blocks, If one file is "lost" then only that file is affected, and the other 99 files sharing the same blocks are still intact. If however, a disk corruption destroys a "block" of data, and that block was used by 100 deduplicated files, then "Yes" you are in a world of pain. That's why you only run dedupe on systems with niceties such as dual-parity RAID, media and parity error detection, lost write protection, etc.
- Boffins attempt to prove the UNIVERSE IS JUST A HOLOGRAM
- China building SUPERSONIC SUBMARINE that travels in a BUBBLE
- Review Raspberry Pi B+: PHWOAR, get a load of those pins
- That 8TB Seagate MONSTER? It's HERE... (You'll have to squint, 'cos there are no specs)
- Review Reg man looks through a Glass, darkly: Google's toy ploy or killer tech specs?