Data storage demands within the enterprise grow every year. Managing this data is a challenge for organisations of all sizes, writes Trevor Pott. The data we move around is now practically measured in terabytes. Depending on your data usage and backup requirements, traditional gigabit Ethernet is simply too slow. Consider for a …
Or, is it time to also extend WAN optimisation to cab-to-cab optimisation?
Simply throwing bandwidth at the problem will only go so far, you need to find a way to make the best use of that bandwidth and cut out the waste.
Price is still too high for many and will need to come down before there is widespread adoption. £5-6K per switch is a huge sum for many organisations never mind the fact you then need to buy expensive NIC's to go along with it.
I guess if there was any life in ATM, it's surely extinguished now...
I guess all those ADSL implementations must be a bit scared of your prediction.
I should have been clearer... ATM is dead as a consideration for replacing ethernet. I remember products such as Newbridge Network'a VIVID product range being evaluated...
Good luck with that
After moving into our very shiny new offices, refitted with "all the latest kit", I soon found out after a trip into the server room that our company networking guys had been cutting corners to save a few bob. Not only were we under required capacity by about 50 fecking ports, but one switch was gigabit, and the other, identical-looking switch was actually 10/100Mbps. Cue phone calls and swearing.
If they're going to try and save such a pitiful amount (these were not large switches) because *gigabit* is apparently too bloody expensive, 10G has no chance.
The key is to reduce the amount of data being backed up in the first place rather than simply throwing more bandwidth at it.
A friend of mine
Hosts a backup service over Internet/VPN. The trick... really smart (and expensive) backup software that only does deltas/incrementals.
That said, the flip side of this conversation is restore time. The rule of thumb was always 2x backup time to restore. So you can't completely discount bandwidth.
My $0.02 for whatever that's worth :)
Backup vs Restore
Yes - incs, diffs, deltas, all the way forward. The issue, however, of double the time to restore is a variable one. If you're on tapes it can take a whole lot longer. If you're on hard discs it can be a lot faster.
The thing you need to ask is how often do I backup, and how often do I need to restore. Personally I backup every day (except Sunday), and I think I restored about 15 times in the past year (mostly individual files or folders).
When you look at it like that, it's not such a big deal taking a bit longer to restore.
"Hosts a backup service over Internet/VPN. The trick... really smart (and expensive) backup software that only does deltas/incrementals."
Duplicity is an open source backup doodad that really needs a GUI and a port, because the command-line version I've set on a cron job is fantastic. Encrypted delta/incremental backups SCP'd to a test server, and it works well for me.
Spot on Jase 1
I did a data review a year ago identifying 130GB out of 460GB of company data on shared drives that could be archived. As I am not allowed to touch this info directly, I made the departmental heads aware of the data that could be moved off active drives (2009 Christmas card lists, 2.7 GB of 1997 Excel spreadsheets, several copies of the same folder, etc.). My review this year shows none of the identified material was touched and we are now at 630 GB total data.
Complaints about the speed of the shares and network have increased as well. Self-fulfilling prophesy I think they call it.
No mention of disk snapshotting, so you can backup offline?
At my previous company we had about an 800TB overnight backup, in order to achieve this we made heavy use of fibrechannel attached tape drives and disk snapshots mounted up on dedicated mount servers that could backup offline. Just before I left we started to use 10GigE IP clients on mount servers and left the tape drives at the general purpose storage nodes/media servers. This way the storage nodes/media servers could be shared with normal 1Gig backup LAN clients and 10GigE mount servers. It certainly drastically reduced costs in licensing the mount servers.
So, how does the snapshot happen?
How many snapshots can you take?
That's the point - no matter how the backup itself occurs, you're going to need to copy the data from the 'live' system to the 'backup'.
If backing up the days data takes longer than a day, you will fail no matter how you're doing it.
Why will it fail?
If your backup system is designed properly, it will allow concurrent backups.
And just because the entire process takes >24 hours, it doesn't necessarily mean that discrete events are going to take the complete cycle.
If I have multiple tablespaces in my backup set, I can have one discrete space backed up and into full production whilst the others continue.
So a bit of a sweepingly inaccurate generalisation there.
Simple: In 1 day you produce an amount of data that your system takes 1.5 days to back up.
Day 1 completes: 1.5 days of data to back up.
Day 2: 3 days of data to back up, 1 day has been done, 2 days left to back up.
Day 3: 4.5 days, 2 days done, 2.5 days left.
Day 4: 6 days, 3 days done, 3 days left.
This can clearly only succeed if your average daily rate of data creation over a given period is less than the daily data backup rate over that same period.
By having spare capacity somewhere in the process you can probably make that averaging period longer, but nothing else.
Concurrent backups is a method of increasing your maximum daily backup rate, however once you hit the maximum transfer rate of the connectivity to your servers you can't go further than that.
To be fair, in many real-world situations you have 7 days to handle 5 days of data.
The snapshot happens as part of the pre-backup scripts, the enterprise scheduler quiesces the application (some also support hot-backup, so no quiesce is required) then the disk array is instructed to take a snapshot, before the app resumes. The snapshot operation typically takes less than 30secs. We typically had one snapshot in each site (Prod/DR) the backup happen at the DR site and for extra protection are duplicated back to Prod (so they've been read, which is a regulatory compliance thing). The filesystems which we mounted on the backup servers would be sized so that we had a typical 4 hours of no operation for maintenance. Corporate policy was that no backup should take longer than 12 hours. The main advantage of using a mount server is that the main app can run all the time and the backup processing is moved to another location. The advantage of using an IP client as opposed to a SAN connected shared tape client is the cost.
Manage your data properly
main issue we have is with users not managing their data properly. We've got around 750TB to backup. When you have users moving data to areas that get backed up when that data doesn't need backing up, or just moving stuff around that then triggers a full back up its a problem, well its a problem when these datasets are 15TB or more!
This is the key. User education is part of the equation, but back that up with a broad ranging strategy. Start with Quotas for home drives, possibly even shared areas.
Look at a properly spec'd snapshot and replication capable NAS - throw the data on there, as a bonus, switch on de-duplication (if it can). Set up on disk snapshots to retain a week or so, replicating to an offsite copy. Relegate tape to archive backups, once a week - you could have this hanging off the replica.
How about an archival solution like Enterprise Vault or similar?
Of course, thissort of thing would need tweaking where user requirements vary, but this is a pretty easy solution. Sure, throwing bandwidth helps, but it's not the only option, plus no single answer on it's own will fix the problem - a strategy is needed, with careful design and planning.
A big EV environment has far more complex backup than a flat filesystem, no matter how bit it is. The last EV that I worked on had in excess of 100TB of index data, spread over multiple servers, all of which had to be synchronised for a point in time backup. No accurate point in time, no recovery!
Is 10GbE much faster than 1GbE?
100MbE is ~10x faster than 10MbE, but actual transfer rate is rarely 10x between 100MbE and 1GbE. Most of the time I see 4x, due to the TCP/IP overhead, even using Jumbo Frames.
How much of an increase in transfer rate do we see between 1GbE and 10GbE using TCP/IP?
Is it time for a new protocol?
I have gotten 900 Mbit sustained out of 10GBE links. Is that 10x 1GBE? Nope. Is it still hella much faster than I can get out of 1GBE? Yep. Is it faster than Fibre Channel? Yep.
So I'll take it. Beggars can't be choosers.
Since i downvoted your post i feel the need to explain.
I'm not a backup guy, but i can assure you that a decent network link will provide you with close to spec transfer rates for gigbit/s links. We have cameras on 1Gbit reliably transferring somewehre around 110 to 120 million bytes per second (~90-95% of spec) of payload (pretty incompressible image data) over a single gigabit link (on "prosumer" intel/pro PC pci-e cards and windows xp).
One thing to be *very* aware of is however collisions. If you have a dedicated point to point link this is fine. But as soon as you have anyone else interrupting, throughput crashes - like divide by factors, not just subtract a bit IME.
True, but that isn't the protocol's fault. If you have big time colliaions, you are using a HUB, and...WTF?!?. Indeed, find me a 10GBE hub...
I was downvoting AC, not you.
Probably should have made that clear.
Where's your datacentre? We haven't had the power budget to stuff a rack that full since the early noughties!
Besides, we abandoned top-of-rack switching years ago so we don't suffer from that bottleneck anyway.
10Gb too expensive, need better backup
The first problem is with the setup. Instead of having just one or two ports to the great outside, put the tape on the "inside" of the storage server. Remember, that's what we used to do?? There are plenty of non-networked options for that. A SCSI Ultrium 5 drive does 140MBps native, so a full tape takes 2 hours. Stripe them, and that multi-terrabyte array will be backed up awfully quick.
The other part is good backup software. When a company doesn't care about its backups, the you're lucky if you even use the provided backup software. The other part is using software which works with how you work, for a minimum system load.
Sure, we have more protocols, but the problem is the software, not the stack on the OS. How many times have you seen something supporting SCTP? At all? That's where things go awry. (I've met more than one programmer who had never heard of TCP "out of band." Ouch.)
Never stripe across tapes, it cripples the recovery time - becuase you have to have all tapes available at the same time as enough drives are available to play back. Also, the consequences of a single tape failing are massive.
Personally I'd stream onto multiple drives and on large tapes I like to be able to clone to another site. I recognise that you have to have a lot of resources to justify this sort of cloning, but it does at least verify your initial backup tapes too, which is a regulatory requirement for financials.
Shared tape drives tend to incurr a very large licence fee, it's often less expensive to take disk snapshots and have your tape drives located on single mount server, where the backups of your larger filesystems' disk snapshots are taken.
overhead and 802.3ad
10gig is cheap, seems crazy to me that just a few short years ago a line rate 48-port 10gig switch would of cost you half a million $ and a good chunk of a rack and now it can be had for probably 93% less and take up 1U.
As for 802.3ad which the article mentions as a stop gap - in many situations it is not because the algorithms that load aggregation uses often ensures that even with multiple links only one link is used if the data transfer is between two systems, works better if its many:many or at least many:one.
Another issue is stress on the storage, my last big storage array our budget was limited so we ran it pretty hot. We had enough to backup what we needed (about 5-6TB/week if I recall right), the disks didn't have enough spare I/O capacity if we wanted to pull many TBs off per day (the array itself was 100+TB usable).. We ingested data at a rate of about 5TB/day, fortunately our backup needs were much less than that.
"Consider for a moment a fairly typical backup requirement of 10 terabytes. At gigabit Ethernet speeds, allowing for 10 per cent network overhead and assuming zero media changes, it would take a little over a day to back that up."
OF course you could use some modern technology to improve this situation.
For instance, tools like Rsync have been around for a long time, not only can it be told to backup only the differences, but it can use various degrees of compression over the network link. Also for those avers of scripts, or the console there are quite a few front ends with a GUI for Rsync.
Another helpful tool is a file system called ZFS, which spots blocks or files that are identical and just keeps ONE copy on the array. I'm sure there are other file systems with this feature out there, but this one comes to mind.
What's more there is backup software that will use the same principle, skipping duplicate blocks or files and just copying the pointers.
I'd surmise that using technology like this a 10TB nightly backup could happen in just a couple of hours, maybe even less.
You'd be wrong. The nightly backup requirement is 10TB. That is using compressed deltas and OTF deduping. The cumulative array size is ~0.5PB. Fairly average nowadays.
Seriously here...my *home* network has something on the order of 50TB. Enterprise backup requirements for a medium sized business being 10TB/day is not remotely unreasonable. It is all a question of what business you are in.
All the fancy software, rsynch, snapshotting or hullabaloo in the world won't help you if can't get the nessecary bits from A to B. Yes, I have 3 shops that actually modify more than 10TB of data a day. Last year, not one of them modified more than 3 TB of data per day.
rsync can fall apart
rsync can fall apart for larger data sets(large numbers of files anyways), most of the time you want something that does differentials at the block level instead of file level.
Medium size company here, our weekly total backup is 2.5Tb, nightly differential is around 800Gb.
The main shortcut we've found is moving as much storage as possible into the backup server, thus avoiding the network bandwidth problem.
It turns out that a Dell PERC6 will quite happily have one array of pricey SAS drives, and another of cheap (consumer grade) 2Tb SATA drives.
Sure, the SATA drives aren't as fast (but still much faster than 1Gb ethernet, so Fast Enough), nor as reliable (well, that's What RAID is for), but they cost us less than £250 for a 5Tb array.
I'm currently moving our network software install share onto it (ie low importance data), which should save us about an hour or so on the weekly full backup.
10Gb ethernet? Maybe in five years.