Privately-owned Arkeia thinks it's in prime position to backup to the cloud because its dedupe technology is better than anybody else's. Arkeia says its dedupe is up to twice as good as Symantec's Backup Exec 2010. It has a slide (pictured) showing it providing a better than 50 per cent size reduction in Excel documents and …
Let me see if I understand this
They not only want people to backup to a potentially unreliable medium, they think that ensuring that there's only one copy of each piece of backed-up data on that medium is a selling point?
I'll pass, thanks. If I backup anything to a cloud I want as many copies, in as many places, as I can comfortably and securely manage.
Let me see if I can help to understand this :)
I agree with you that you should always retain at least 2 copies of backups in different locations. But there is no interest to keep redundant data on the same backup storage (whatever it is disk or tape).
For example, if you backup the user folder of 20 sales people using the same ppt presentation with only the first slide modified (sales person name + company presented it to) you might have hundreds occurrences of the "almost" same file: block-level dedupe allows you to dedupe within all these files, across all the backed up machines (for the products able to dedupe across all backed up machines).
How long is a piece of...
Common-or-garden PKZIP can reduce a typical .xls by 50% or more. 7-zip might do even better.
For .vmdk files, compressibility will depend on whether all space is allocated at creation, or not. If yes, then there will be acres of blank space which can be very easily compressed. Now, if they'd taken already-compressed .jpg, .mp3 or .avi files into account, the results might look less promising.
For years, backup software and hardware vendors have made unrealistic claims as to media capacity, through quoting hypothetical compressed values instead of actual figures. This leads buyers who are unaware of this hype into specifying backup solutions which are inadequate for the task in hand, for example a "500GB" tape drive typically will NOT backup a full 500GB disk. Sometimes, not even a half-full disk. Consequences are eventual failure of automated backups to run properly, and the need to replace costly but under-spec'd backup equipment ahead of life-expectancy.
This claim might be genuine, and I'm prepared to give it the benefit of the doubt. But, I'm wise to the backup industry's reputation for hype.
NEVER dedupe unstructured data...
...because the doc or xls file contains pre-compressed content already, further dedupe just increases performance needs but realized storage savings will typically be low and not justify the investment.
Native Format Optimization which preserves the file format can achieve 40-75% data reduction but without the disadvantages of dedupe (single point of failure, performance) and is the best way to tackle unstructured data - dedupe was made for backup, not for unstructured data.
And, I want to understand: If the PPT file is deduped and even 50% of space is saved because not 2 but 1 copy is saved, how the hell is this generating savings in the network? The single file is still X MB big and still pushed as a whole through the line...
Savings in network bandwidth is achieved by source-side backup solutions (such as Arkeia or Avamar). Only unique blocks not previously saved are sent to the backup server, hence the bandwidth usage reduction.
My 2 cents ;)
- Vid Hubble 'scope snaps 200,000-ton chunky crumble conundrum
- Bugger the jetpack, where's my 21st-century Psion?
- Google offers up its own Googlers in cloud channel chumship trawl
- Windows 8.1 Update 1 spewed online a MONTH early – by Microsoft
- Interview Global Warming IS REAL, argues sceptic mathematician - it just isn't THERMAGEDDON