14 posts • joined 4 Sep 2008
EMC / Data Domain is NOT faster than VTL
"Data Domain says that overall resource use on the media server is reduced by 20 to 40 per cent because of a reduced data copy overhead."
Data copy overhead? This is a patently stupid and nonsensical statement by EMC. There is no such thing as copy overhead - the whole backup process is a data movement process. The data moves through the media server whether it's raw or deduped. Deduplication on the media server creates ADDITIONAL overhead just as it does with the CommVault deduplication. This is basic backup physics. It's also the very same argument that the Avamar sales team at EMC claims when competing against CommVault's dedupe strategy. (They used to claim that you had to reduce data on the host to gain efficiency).
Here are the basics:
The whole point of using a solution like this is to reduce your backup window... so why would you add an additional process that INCREASES your backup window?
Inline deduplication is no where close to being as efficient as going straight to VTL. Many enterprise customers have tested it and inline dedupe (whether from EMC/Data Domain or others) loses every time in head-to-head tests. Why? Because the inline process creates REAL overhead and slows down the backup job. EMC has been getting killed in Enterprise accounts for this (and for the lack of global dedupe and lack of FC connectivity... but that's a different story). So what did EMC do to fix the problem? They split up the dedupe process between the media server and the appliance with beefy processors. The improvement they get is not from the design, but from the chips... and even then it's not as fast as writing at full speed to a VTL. The design is still flawed.
Any process that moves data between point A and point B is fastest without ADDITIONAL processes (like dedupe) in the way. If your backup job takes X amount of time to complete when writing directly to a VTL, any additional processes (no matter the speed of the processor) will incrementally add to the time to complete your backup job.
It's the very worst combination of what CommVault does with media server dedupe combined with the very worst of data domain - inline dedupe.
This new version is a band aid to mask the problems of inline dedupe which is why EMC is touting improvement speeds without comparing it to the speed of writing to multiple VTLs with global dedupe. This is how they are losing in the enterprise. Its why they always test these for customers using small data sets. When the customer wants to test a real full backup set, they balk.
Why no mention of FalconStor?
FalconStor has the reputation as the industry's fastest enterprise VTL/Dedupe. Combined with the fact that you can put it on top of any one else's storage (whereas Data Domain's enterprise solution is NOT a gateway and you are stuck with their junk disk) why wouldn't they be apart of the conversation? EMC uses a pared down version of FalconStor's VTL.
Who is Naive?
Geoff Mitchell said: "Contstraining one's choice of storage by a protocol is at best naive and worst an indictment of a storage administrator not looking for the optimal alternatives for his business."
Right - that's an argument against buying EqualLogic. If you bought an EqualLogic box, you would be constraining yourself to iSCSI which would be naive if you were truly enterprise.
Chris is 100% right
Holy smokes are you guys are morons. iSCSI has NEVER been an enterprise class protocol - it's is solidly an SMB protocol. Now, that doesn't mean that an enterprise couldn't use it (especially at the edge), but you will not see an enterprise-class data center be 100% iSCSI. In fact I would challenge any of you to name an enterprise-class data center that is 100% iSCSI. (And NO BS - I mean enterprise-class).
Don't waste your time, they don't exist.
Now... some of you talk about other vendors that are multi-protocol but you somehow confuse that with being enterprise-class. The reason those vendors offer both iSCSI and FC is to reach down the food chain of requirements, not to rise up. This is common sense and well known.
EqualLogic is iSCSI only because it cannot integrate FC and was never meant to. It would take a complete platform overhaul to integrate FC.
What about infrastructure costs
This all sounds good until you work through a potential implementation. Infrastructure costs - specifically bandwidth- makes the cloud computing model unworkable for mid-sized to enterprise-sized companies. When you factor that in, it would actually cost you more to outsource your server and storage infrastructure than it would to keep it in house. Companies that run global virtual desktops to a centralized location (their own data center) can easily reap the benefits of centralized management, control bandwidth costs and manage their own security much better than any cloud computing model being offered yet.
NetApp way ahead
>"A couple of points, firstly you must buy into the Netapp filesystem, essentially making your array a slave to Netapp's NAS implementation."
Yes, of course... and the vendors who only perform the disk functions whine about this because they want to control the WHOLE environment and resist any open storage platforms. It's a weak argument if the configuration meets or exceeds the end-user expectations.
>"Just because Netapp are pitching this as a supported solution, I don't see any of the other vendors clamouring to sign joint support agreements around this. Who carries the can if it all goes tits up."
Yes, of course... why would EMC certify NetApp as a gateway when EMC wants to own your whole environment and make you buy just EMC? In an open environment, the end-user has to exercise some authority over the role each vendor's piece plays in their environment. If EMC (or any vendor) balks at just being disk to a NetAp gateway, the end-user needs to walk away from EMC and work with a vendor that will cooperate... and trust me... EMC will cooperate if they think they are out of the solution.
>"This only works for certain usage cases"
Give me a scenario where it doesn't work.
NetApp is way ahead on this
ASIS is VERY different from Data Domain. Since NetApp's gateway can sit in front of EMC, HDS, IBM and HP disk (since when is EMC/HDS 'cheap and dirty disk????)... AS PRIMARY DISK then there is a real deduplication advantage.
Data Domain dedupes ONLY backup data and usually to their own disk (yes they sell a gateway, but just try and buy one... you'd have better luck selling your Lehman stock). Data Domain is a one trick pony. Once people wise up that deduping your primary storage eliminates the need to dedupe your backup data, they'll be a bust.
Riverbed looks like a 'me too' product... we'll have to wait to see the specs
I'll respond to the others here shortly... but I wanted to post this quickie in response to the Earthquake scenario:
"It's very hard to be sure of what's gone wrong. Quake? Brief network outage? Power outage? Remember the news stories after Loma Prieta, 9/11, New Orleans? For hours no-one had a clue about what was really happening. Having a DR site pick up while the real problem is that someone tripped over the network cables (figuratively speaking) can cause much more grief than it cures. You can automate failover *within* a data centre, it is rarely safe to do it across geographical distances."
You are 100% correct about local failover being your first line of defense. However, I also believe that a properly configured DR architecture using CDP is just fine for even those instance where someone "trips over a cable". You can failback just as easily as you failover so why wouldn't you? If your clients don't see any interuption, isn't that where the value of DR is? Again this is a 'backup vs. CDP' argument and a tape backup system wouldn't be of any use in this scenario.
Just for a real-world example, we have offices in Miami. When evacuations begin for a hurricane, we usually have a 48 hour window to stage the failover (essentially babysit it) and test it before it actually goes live. If we were to have a fire i the building, it would failover at the first moment of disruption (which is tied into our security)
"What happens if a Jumbo crashes onto your data centre/office? Where is your data then?"
A good DR strategy incorporates replication to a location far enough away to not be a factor in your risk profile (i.e. if you are in a flood zone, someplace outside the flood zone, for power failure, you want to replicate to a location in a different grid, etc)
"What if the net goes down. What if you have a major natural disaster . lets take e October 17, 1989, Loma Prieta, California. That earth quake knocked alot of telecommunication lines down. What do you do if you need to do a restore but you local back up is messed up and the net is down ??. There will always be a need for tapes for local and off site back up."
First, in 1989, your best option was tape backup. Secondly, if telecommunications are down you have an increased risk of no power as well so attempting local recovery is futile (think of your Exchange servers for instance... after the 4+ hours it would take to get your offsite tapes back and restored, with no telecommunications you have no email and no business gets done). As I stated above, your DR site should be outside your risk profile. Earthquake hits, automated failover to your DR site picks up near instananeously and you are still in business. Also, with FalconStor CDP, you will have a local copy of the data for whatever window you want (24 hours, 1 week, 1 month - you chose).
"So does CDP know enough to "bundle" associated writes?"
I'll have to get an answer for you this since I'm not sure what you mean by "bundle" associated writes. I can tell you that I have never seen a FS corruption when doing a restore on our servers.
One other note...
I'm not claiming that you never need backup or tape. I think most people would argue that tape is an excellent archiving media. I would argue that the methodology for conventional backup is out of date and doesn't reflect the increasing demands for data restoration within 2 hours to a point in time less than 8 hours old.
I know some manufacturing facilities that are not tech driven and they can produce widgets whether their servers are up or down. 24 hours for RPO and RTO is not a big deal to them so conventional backup meets their business requirements. Many businesses, however, rely on their servers and applications to be functioning and have a low tolerance for an outage of greater than 4 hours.
One last point...
Tape is NOT a DR solution and was never intended to be - it is a backup and archiving medium.
Apples and Oranges
"If you had a disc with a high rate of changes to individual blocks in a relatively small file, a system that stores every block change will store more data than a system that stores the whole file at the end of the day.
If it's truly a single instance of each block, then it's not much different to mirroring and you wouldn't be able to wind back. If it's an instance of each overwritten block, then you can wind back a file to any point, however it will use shit loads of storage when applied to active files."
Your scenario completely ignores the requirements for the recovery point objective and recovery time objective - which would answer your question. You are comparing apples to oranges.
The scenario you suggest (many block changes with a presumed restore at any single point) would indeed require a larger amount of storage... but it's also not a scenario that could be addressed with traditional backup and tape. If the requirement is to recover from any point in time - tape is certainly not the answer and CDP, regardless of storage consumption, is the ONLY efficient option.
...And more responses to you bricks...
"What about offsite backups?"
-What about them? Use replication.
"What about multiple retention items?"
-What about them? CDP allows you to pick retention levels
"Running that much disk would represent a large ammount of power and heat in your datacentre"
-Not if you use disk using MAID technology (like Nexsan)
"the consequences of an array going bang where this tech is used doesn't really bear thinking about"
-Sure... if you have 3 days to restore from tape and then rebuild the RAID
"Maybe I'm mssing something, but why is this any different from what all the storage vendors have been doing for years?"
-For starters, those example are REALLY expensive disks for backup or CDP. FalconStor is disk agnostic and tier 2 disk like that from Nexsan is not very expensive at all
Most of you are dumber than bricks...
I can't believe the collective lack of intelligence on this comment board.
From the top...
"How do you handle offsite storage for disaster recovery?"
-With IP replication like nearly every other DR solution. CDP doesn't not limit your recover options.
"You do still need backups, especially off-server ones"
-Huh? No you don't - CDP is the backup and recover tool and you can use it locally or remotely.
"This will require shit loads of storage"
-Re-read the article - it essentially saves and replicates a single instance of each block and uses MUCH less capacity than tape which typically does full and incremental backups
"tape still has a role for a rotated off-site DR scheme"
Your DR is dependent on tape? How many days untill you are fully restored? -Good luck with that... and keep your resume up to date.
"So what is the difference between this and RAID 1?"
-Umm... For starters, replication and single instance blocks.
"However, if you're also looking at compliance and other "archiving" needs, then you'll need that infinite amount of storage everyone forgets about."
-Archiving is not backup you dolt! If you don't understand the difference, it's time for you to go back to "Storage 101" class.
"So where is the CDP, and how do you recover that if there is a problem?"
-Hopefully your DR scheme has you replicating offsite. In the remote site, you can get near-instananeous restores (try doing that with tape)
"Just changes your backup software supplier from X to FalconStor"
-Yeah... and eliminates annual backup software maintenance while giving you a ZERO backup window and near-instantaneous restores. But go ahead and keep using 30 year-old backup technology.
"Using a disk-image produced by any software regardless how magical is totally useless for DR because it does not know if applications have left the data in a consistent state at that particular moment."
-A good CDP solution is application and database aware - clearly you have no idea what you are talking about.
Umm How about the controller?
What most of you are missing in your drive-type pissing match is that the controller and RAID type will have a much more dramatic impact on throughput than the drive type. It's no wonder you see some people have success with SATA and others don't.
The Register ranks #1 for storage news
Don't know if you saw this or not - pretty cool:
- iPad? More like iFAD: We reveal why Apple ran off to IBM
- +Analysis Microsoft: We're making ONE TRUE WINDOWS to rule us all
- Climate: 'An excuse for tax hikes', scientists 'don't know what they're talking about'
- Analysis Nadella: Apps must run on ALL WINDOWS – PCs, slabs and mobes
- Apple: We'll unleash OS X Yosemite beta on the MASSES July 24