Continuous data protection could render dedupe, virtual tape libraries (VTL) and backup software redundant. Er, run that past me again. Alexander Delcayre, FalconStor's technical director, says that the company's Continuous Data Protection (CDP) product is block-level, not file-level. It captures every write I/O a server makes …
.. my brain hurts, and I agree.. I think I'm dreaming..!
sure you still need backups..
You do still need backups, especially off-server ones, or are you telling me you'll never have a catastrophic RAID or server hardware failure?
CDP is a nice idea and I'd use it for quick recoveries of data (Netapp and other storage vendors have had snapshot technology for many years), but not getting rid of backups full stop.
How do you handle offsite storage for disaster recovery? Run 25 miles of fibre to another disk array at the offsite facility?
So every changed block on a protected disk is stored in perpetuity?
So instead of writing a block, we write the block and then store a copy of the block on a spike. 2 block writes, 2 blocks on the spike, plus the changed one on the disk.
This will require shit loads of storage.
It might be tricky choosing an exact point in time to return to as well.
So, you need cheap storage for the 'spike', a write once FIFO store where the medium is cheap and lengthy... Like tape?
CDP does not replace Tape
We use CDP (not Falconstors) and it is an impressive system, however tape still has a role for a rotated off-site DR scheme, something that disk based CDP does not provide generally. So: CDP for your regular backups and ease of restoring individual files and bare-metal restores. Tape backup (either of the server, or of the CDP store) for full off-site DR.
pah, old hat
try one that coalesces the blocks too, and provides a encryption mechanism
Plan 9 ftw.
People have been telling us it's irrelevant but the ideas are creeping into other products all over the industry.
An even dumber question...
I'll see your dumb question and raise it
"can you prove it?"
I remain skeptical but am open to evidence.
Paris because she's a mush-for-brain.
Brain of mush? Only partially.
This sounds really nice - and if all you are looking for is data protection (on hardware failure or something), it should be fine. However, if you're also looking at compliance and other "archiving" needs, then you'll need that infinite amount of storage everyone forgets about.
It also puts an overhead on the system (since every write has to go through their software stack and go to an additional storage device), all the time rather than just during your backup window. It'll happily corrupt your extra copy of the data too, if your app/os starts throwing bad data at the storage devce.
Yes, there is a place for this, but it only covers a subset of what your backup solution is there to do, at a cost that for some uses will be unacceptable.
Final example. If you've got an OLTP DB server, running this is either going to end up with a mirror of the data (e.g. just the most recent data for each block), or effectively a block level transaction log of the DB - which will be absolutely humungous.
"Bare-Metal" Restore from tape...
The usual "bare-metal" restore from tapes is in fact not bare-metal :
->install base OS from ghost
-> install backup server agent
->(reboot, never hurts)
->restore from tape
Which is more or less exactly what you'll have to do with CDP or disk to disk backup.
Real difference is that tapes cannot beat disk to disk restore for speed...yet...
old tech in new wrapper
So what is the difference between this and RAID 1?
So where is the CDP, and how do you recover that if there is a problem?
If I have a problem that wipes out my virtual servers and their data centre, and the CDP is in the same data centre I'm still stuffed aren't I?
"render [..] backup software redundant ?"
Err - no. Just changes your backup software supplier from X to FalconStor (they hope). Some of it sounds nice (although nothing leaps out as particularly new), but very wishy-washy - difficult to tell more without any details to speak of...
Still, they must be happy to get a free sales pitch on an IT website :)
Another useless tech to be enjoyed by the clueless
Using a disk-image produced by any software regardless how magical is totally useless for DR because it does not know if applications have left the data in a consistent state at that particular moment.
This of course does not prevent thousands of idiot pseudosysadmins around the world to rely on apps like Norton Ghost for backup. They will meet this app with glee and deploy it everywhere to produce some more unviable backups and money for the investors in company producing the software.
At the same time, the people who have working DR will continue to rely on software that interfaces (even if it is a crude start/stop) into their applications and ensures that the data is in a consistent state before backing it up.
Nothing new to see here... Move along...
Paris, as she probably knows more about keeping her data safe than a sysadmin using disk-level imaging for backup.
Nothing new here...
Got regulatory concerns?
Two more words.
I always forget my raid numbers, but I guess 1 is mirroring.
The difference is that in mirroring a written block gets written to both disks, thus disk 2 is a copy of disk 1.
In this system, the old block data (to be overwritten) is stored in a big stash somewhere, when that block is written again that block is also stored in the big stash, thus the big stash has every version of the block not just the last.
How the heck you know what point you can stop restoring at god knows. I suppose you could build in a stop marker, ie stop the apps, then write a block marking a safe restore point. But then that's the point where you could just do a conventional backup.
As an aside, VMS used to have file level versioning configurable to any number of old versions. Effectively when a file was opened for writing a copy was made with a ";version_number" on the end of the file, letting you roll back without any whacky software.
"...Tell me I'm dreaming. FalconStor is weaving a data protection reality distortion field and my brain is mush - or is it? ..."
You're dreaming. ;)
What about offsite backups? (Maybe array level replication, but that doesn't allow for companies with only one datacentre)
What about multiple retention items? Keep a snapshot once a month for a year, or more. Weekly backups only retained for a month etc.
Running that much disk would represent a large ammount of power and heat in your datacentre, even if you could somehow migrate the older block changes to not-always-on disks.
Lots of tape costs a lot less than lots of disk and you would still require a lot of disk.
Also the consequences of an array going bang where this tech is used doesn't really bear thinking about.
I'm not saying this wouldn't be usefull, it would definately be good for small systems or OS disks etc. but its not a magic bullet.
Why is this a story?
EMC SRDF? Timefinder?
Hitachi TrueCopy? Universal Replicator?
and more, many more. All can be used locally (Sync mode) or remotely (Async, over IP links) for DR.
Maybe I'm mssing something, but why is this any different from what all the storage vendors have been doing for years?
Besides, just put ZFS on your disks, and take regular snapshots, and you don't even have the overhead of copying the blocks.
Most of you are dumber than bricks...
I can't believe the collective lack of intelligence on this comment board.
From the top...
"How do you handle offsite storage for disaster recovery?"
-With IP replication like nearly every other DR solution. CDP doesn't not limit your recover options.
"You do still need backups, especially off-server ones"
-Huh? No you don't - CDP is the backup and recover tool and you can use it locally or remotely.
"This will require shit loads of storage"
-Re-read the article - it essentially saves and replicates a single instance of each block and uses MUCH less capacity than tape which typically does full and incremental backups
"tape still has a role for a rotated off-site DR scheme"
Your DR is dependent on tape? How many days untill you are fully restored? -Good luck with that... and keep your resume up to date.
"So what is the difference between this and RAID 1?"
-Umm... For starters, replication and single instance blocks.
"However, if you're also looking at compliance and other "archiving" needs, then you'll need that infinite amount of storage everyone forgets about."
-Archiving is not backup you dolt! If you don't understand the difference, it's time for you to go back to "Storage 101" class.
"So where is the CDP, and how do you recover that if there is a problem?"
-Hopefully your DR scheme has you replicating offsite. In the remote site, you can get near-instananeous restores (try doing that with tape)
"Just changes your backup software supplier from X to FalconStor"
-Yeah... and eliminates annual backup software maintenance while giving you a ZERO backup window and near-instantaneous restores. But go ahead and keep using 30 year-old backup technology.
"Using a disk-image produced by any software regardless how magical is totally useless for DR because it does not know if applications have left the data in a consistent state at that particular moment."
-A good CDP solution is application and database aware - clearly you have no idea what you are talking about.
@Gary A : All bow down to his genius, grovel at his majestic feet.
>"This will require shit loads of storage"
>-Re-read the article - it essentially saves and replicates a single instance of
>each block and uses MUCH less capacity than tape which typically does full
>and incremental backups
It depends on the rate at which blocks are overwritten versus the incremental backup period.
If you had a disc with a high rate of changes to individual blocks in a relatively small file, a system that stores every block change will store more data than a system that stores the whole file at the end of the day.
If it's truly a single instance of each block, then it's not much different to mirroring and you wouldn't be able to wind back. If it's an instance of each overwritten block, then you can wind back a file to any point, however it will use shit loads of storage when applied to active files.
Actually re-reading the article, it claims that you can restore a file to any point, therefore the route to all points must be stored. When the points change often there is a lot to be stored.
...And more responses to you bricks...
"What about offsite backups?"
-What about them? Use replication.
"What about multiple retention items?"
-What about them? CDP allows you to pick retention levels
"Running that much disk would represent a large ammount of power and heat in your datacentre"
-Not if you use disk using MAID technology (like Nexsan)
"the consequences of an array going bang where this tech is used doesn't really bear thinking about"
-Sure... if you have 3 days to restore from tape and then rebuild the RAID
"Maybe I'm mssing something, but why is this any different from what all the storage vendors have been doing for years?"
-For starters, those example are REALLY expensive disks for backup or CDP. FalconStor is disk agnostic and tier 2 disk like that from Nexsan is not very expensive at all
When did DEC invent "StorageWorks Virtual Replicator" ?
In the late 1990s, that's when. And it did basically this.
What's old is new, every time there's a new bunch of college kids making the transition to IT manager or Accenture consultant or whatever... most branches of engineering try to learn from what's gone before and try to NOT reinvent the wheel, the world of IT tries to reinvent a new wheel with allegedly-new USPs (and repeat disadvantages) every few days. Strange.
Apples and Oranges
"If you had a disc with a high rate of changes to individual blocks in a relatively small file, a system that stores every block change will store more data than a system that stores the whole file at the end of the day.
If it's truly a single instance of each block, then it's not much different to mirroring and you wouldn't be able to wind back. If it's an instance of each overwritten block, then you can wind back a file to any point, however it will use shit loads of storage when applied to active files."
Your scenario completely ignores the requirements for the recovery point objective and recovery time objective - which would answer your question. You are comparing apples to oranges.
The scenario you suggest (many block changes with a presumed restore at any single point) would indeed require a larger amount of storage... but it's also not a scenario that could be addressed with traditional backup and tape. If the requirement is to recover from any point in time - tape is certainly not the answer and CDP, regardless of storage consumption, is the ONLY efficient option.
What happens if a Jumbo crashes onto your data centre/office?
Where is your data then?
Of course you still need backups duh! This is just a way of providing quick restores whilst using the minium space to take a backup - journal or not - it is a backup. Sounds like a single point of disaster if you ask me .
Few questions Gary
What if the net goes down.
What if you have a major natural disaster .
lets take e October 17, 1989, Loma Prieta, California. That earth quake knocked alot of telecommunication lines down. What do you do if you need to do a restore but you local back up is messed up and the net is down ??. There will always be a need for tapes for local and off site back up.
Just a nit...
"It captures every write I/O a server makes"...
Um, that'd be just an "O".
So does CDP know enough to "bundle" associated writes? I mean, if I delete a file from a block, I've got (e.g.,) a write to the file system journal to log the delete, I've got a write to the file system to remove the directory entry, and I've got a write to the block to update the file header to put it on the free space list. If not all of those writes get coordinated when attempting to restore a (possibly different) file in that same block, then wouldn't you get some FS corruption?
"What happens if a Jumbo crashes onto your data centre/office? Where is your data then?"
A good DR strategy incorporates replication to a location far enough away to not be a factor in your risk profile (i.e. if you are in a flood zone, someplace outside the flood zone, for power failure, you want to replicate to a location in a different grid, etc)
"What if the net goes down. What if you have a major natural disaster . lets take e October 17, 1989, Loma Prieta, California. That earth quake knocked alot of telecommunication lines down. What do you do if you need to do a restore but you local back up is messed up and the net is down ??. There will always be a need for tapes for local and off site back up."
First, in 1989, your best option was tape backup. Secondly, if telecommunications are down you have an increased risk of no power as well so attempting local recovery is futile (think of your Exchange servers for instance... after the 4+ hours it would take to get your offsite tapes back and restored, with no telecommunications you have no email and no business gets done). As I stated above, your DR site should be outside your risk profile. Earthquake hits, automated failover to your DR site picks up near instananeously and you are still in business. Also, with FalconStor CDP, you will have a local copy of the data for whatever window you want (24 hours, 1 week, 1 month - you chose).
"So does CDP know enough to "bundle" associated writes?"
I'll have to get an answer for you this since I'm not sure what you mean by "bundle" associated writes. I can tell you that I have never seen a FS corruption when doing a restore on our servers.
One other note...
I'm not claiming that you never need backup or tape. I think most people would argue that tape is an excellent archiving media. I would argue that the methodology for conventional backup is out of date and doesn't reflect the increasing demands for data restoration within 2 hours to a point in time less than 8 hours old.
I know some manufacturing facilities that are not tech driven and they can produce widgets whether their servers are up or down. 24 hours for RPO and RTO is not a big deal to them so conventional backup meets their business requirements. Many businesses, however, rely on their servers and applications to be functioning and have a low tolerance for an outage of greater than 4 hours.
One last point...
Tape is NOT a DR solution and was never intended to be - it is a backup and archiving medium.
attack of the fanboy
This thread was interesting until I hit Gary A's evangelism.
Riddle me this Gary A
How much effort is Oracle to recover? Wouldn't the database at the DR site think the dB is in an inconsistent state if what is in memory is not the same as on disk? If you create a new index and pin it into memory how does the DR server know? You change a package and pin it as well again how does Oracle know?
Now, how much "Extra Disk" would it need? Redo and archive logs change quite quickly so one might think that "Extra Disk" would add up quite quickly.
I think it is a nice technology but when you are talking Oracle i would think Oracle does CDP better with either RAC or cheaper Data Guard. Both would keep the database consistent at the other location. You obviously would have the Oracle licenses for you DR site anyway so why disk level not transaction level? The extra gear would also be a consideration. 2 site 2 SANs 2 Oracle Server Lics by default. Now we need min 2 CDP Connectors 2 CDP Gateway appliances and more disk at both locations. How much extra cost and complexity are you adding? How much extra resources will you require? Cooling/Power/Staffing. Although, FalconStor has come out with an interesting solution for DR it's not the answer to everything. So dude please chill, us BCDR heathens probably get enough flack at our day-2-day low level IT jobs without being chastised for how stupid we are by the likes of you.
@Gary A : Apples and Oranges
I wasn't comparing them, I was observing the technical issues and differences.
It is the article that compares them.
>The scenario you suggest (many block changes with a presumed restore at
>any single point) would indeed require a larger amount of storage...
Yes, and that's how the article describes this system.
It could be different, there could be some clever way of merging changes to save storage, beyond every changed block. But I believe you're still looking at lots of storage for active file systems.
>but it's also not a scenario that could be addressed with traditional backup
No, it's not, but then traditional backup doesn't try to do that.
>If the requirement is to recover from any point in time - tape is certainly not
>the answer and CDP, regardless of storage consumption, is the ONLY efficient >option.
It's not the requirement though (mostly, some people would find it very useful), that's a characteristic of this backup system.
Generally when you restore you want to restore to a known safe point, not gradually replay the events of a file system. The known safe point varies by application and storing such safe points can be done (and is) by tape.
I'm not dis'ing the concept, obviously it has its uses, but it's not the end of traditional backup (normally tape).
Incidentally you seem to suggest that every disk write could entail sending the old block data for storage off site over a WAN type network. What happens when disc writes outpace network bandwidth?
As for tape, it could still be the storage medium for this type of backup. Provided it had the bandwidth to keep pace with disc writes.
at a sonicwall demo, that you'd have a CDP on and off-site. Quick snap-shot restores back 14 iterations from the on-site, off-site for disaster recovery, which is mirrored across 2 data-centres. If you lose your local CDP in a disaster, they'll courier your off-site CDP to you within hours and have you back up within 20 mins. They also provide servers in the data-centre mirroring your server if you wish, so with a simple flick of the DNS switch, you're back on-line while you organise new premises/hardware.
Tape is fugly.
If it's not on tape,
It's not backed up.
"I can't believe the collective lack of intelligence on this comment board"
Nor the absence of tact.
I guess it's tricky...
@Steve - the EMC product in this space is recoverpoint.
The different between CDP and snapshots is that you can recover to any arbitrary point in time, not just to the times when you snapped. Hence the C in CDP.
Yes, it needs pairing with replication to provide offsite cover - Recoverpoint does this (it's superseding MirrorView), and I'm sure Falconstor have a similar thing.
The amount of space it needs is "every write" * "retention time", so you may want to dump to tape for archive purposes.
CDP is excellent at preserving transient files that have high value, but might be missed by snapshots in my little world.
It may or may not have a copy on write penalty, depending on the implementation (modern systems tend to "write elsewhere" instead of copying).
> Earthquake hits, automated failover to your DR site picks up near instananeously and you are still in business.
You do *not* want your DR site picking up automatically in that case. For a whole bunch of reasons
It's very hard to be sure of what's gone wrong. Quake? Brief network outage? Power outage? Remember the news stories after Loma Prieta, 9/11, New Orleans? For hours no-one had a clue about what was really happening. Having a DR site pick up while the real problem is that someone tripped over the network cables (figuratively speaking) can cause much more grief than it cures. You can automate failover *within* a data centre, it is rarely safe to do it across geographical distances.
A disaster impacts much more than the IT gear. You have to have a plan in place to deal with people, buildings, and all infrastructure. IT is only a part of the story, and it is only a part of the solution. You need a business continuity plan, and a business continuity manager to take the decision to put that plan into effect. That plan will help you decide how and when to failover to your DR site, co-ordinated with all the rest.
I'll respond to the others here shortly... but I wanted to post this quickie in response to the Earthquake scenario:
"It's very hard to be sure of what's gone wrong. Quake? Brief network outage? Power outage? Remember the news stories after Loma Prieta, 9/11, New Orleans? For hours no-one had a clue about what was really happening. Having a DR site pick up while the real problem is that someone tripped over the network cables (figuratively speaking) can cause much more grief than it cures. You can automate failover *within* a data centre, it is rarely safe to do it across geographical distances."
You are 100% correct about local failover being your first line of defense. However, I also believe that a properly configured DR architecture using CDP is just fine for even those instance where someone "trips over a cable". You can failback just as easily as you failover so why wouldn't you? If your clients don't see any interuption, isn't that where the value of DR is? Again this is a 'backup vs. CDP' argument and a tape backup system wouldn't be of any use in this scenario.
Just for a real-world example, we have offices in Miami. When evacuations begin for a hurricane, we usually have a 48 hour window to stage the failover (essentially babysit it) and test it before it actually goes live. If we were to have a fire i the building, it would failover at the first moment of disruption (which is tied into our security)
To the critics commenting here
I'm no expert, but this seems to suggest in this case study FalconStor CDP can be deployed remotely: http://preview.tinyurl.com/4sw4qj. Presumably, then, this data store could be mirrored to a location elsewhere too. So... doesn't that kind of eliminate half of Anonymous Coward's (at least) critical posts? Like I say, I'm no expert... just following all your terminology.
sure you still need backups.. but CDP can help
1) CDP won't protect you 100% against hardware or site failures, I think this was not the purpose of this interview ... Tape/Archiving is not dead, Mirror/replication is not dead.
2) However, CDP can enhance your existing backup solution with continuous journaling and any-point-in-time recovery, to protect data (databases, messaging applications) between regularly scheduled backups.
3) CDP technology can be combined with traditonnal mirror/replication technology to ensure off-site recovery.
I agree with Gary A, a good CDP solution is application and database aware - clearly you have no idea what you are talking about !
- Review Reg man looks through a Glass, darkly: Google's toy ploy or killer tech specs?
- MEN WANTED to satisfy town full of yearning BRAZILIAN HOTNESS
- +Comment 'Stop dissing Google or quit': OK, I quit, says Code Club co-founder
- Nokia: Read our Maps, Samsung – we're HERE for the Gear
- Apple tried to get a ban on Galaxy, judge said: NO, NO, NO