Still going strong after 60 years.
What do you do with 110,000 IBM 3590 cartridges? That’s the problem an Indian resources company faced last year. The cartridges, which IBM introduced way back in 1995 under the Magstar name offered a 60 gigabyte capacity that then seemed capacious, and collectively contained around 11 petabytes of geospatial data. But with 3590 …
Still going strong after 60 years.
after they migrate the data from the tapes, recycle the tapes themselves to launch a new line of designer clothes and wearable accessories.
It's way more profitable than dumping the tapes in a landfill and it gives a new life to the stuff.
Surely, if you have 11Pb datasets spread over 110,000 tapes, wouldn't you - at some point - have started thinking about what to do when the drives die?
And wouldn't you have the "new" tape format phased in as part of that migration (hell, you had to buy it anyway, if you wanted to read those new "converted" tapes)? And as part of that migration, wouldn't you slowly shift over all your old tapes onto the newer ones piecemeal with every backup you do?
I don't see that the company did anything fabulous there - they just did what the original datasets should have been doing all along. It's not like they were reconstructing the tapes from the bit level because of damage (and even if they did - why weren't you having multiple backups and moving them onto newer media all the time to prevent that?)
When you have to hand over that amount of data to a third party for "conversion", it just smacks to me of inadequate backup procedures in the first place. And I guarantee that they weren't cheaper to hire than it would have been to just have a slow in-house migration anyway.
Seriously. 110,000 tapes. That's a stupidly huge amount of effort that you should have been migrating all along if you knew you were going to keep it.
I don't think these are backup tapes. It sounds to me like a huge and complex data archive of information collected over many years. Tape is the chosen medium because of its reliability, capacity and cost. The complexity in the conversion is explained in the article where they speak of reconstructing from many tapes before writing to new ones and this is a bigger task than you probably realise. It wouldn't surprise me at all to find that the new IBM tape drive won't fit the old machine too so they probably also had to migrate to new systems at the same time.
And if you'd ever done a large project you would know that in house almost never actually finishes a job so outsourcing is ideal in this scenario. Especially since the in house guys still had their own jobs to do.
Clearly, the fact that the data was being stored on tape whose drives "are hard to come by" says that the data is never accessed. If it is never accessed, then it is never used. If it is never used, then throw it away. This sounds like needless data retention due to a lack of intelligent policies, which wastes money.
That's nothing. You must see the amount of red tape this so called government of my country has created.
So let me see now... 110,000 tapes. If a cartridge is roughly 10cm tall x 3cm wide and I build a cabinet that is 2m high, I could stack them 20 rows by 5,500 columns(??). That gives us about 165 meters of rack space. I get tired just thinking about having to walk 165 meters to go and draw a tape.
<<< I'll need to go and top top-up my liquid levels after this.
So, the Myanmar government has a similar project to tackle? I bet there's no ethical concern with any of that data...
Regarding the cost of disk storage, 11PB would be about $.5M today (about the same 18 months ago too). What the hourly rate do the lackeys doing the photographing and db reconstruction have to be on to make tape actually cost effective?
I can assure you that any form of remotely reliable enterprise quality disk to store this amount of data will cost so much more than half a million quid that it doesn't bear thinking about.
The cost of the electricity alone would be vast, you also can't really leave disks powered down in the same way that you can leave tape on a shelf because the bearings tend to lock up over time.
Tape is still really the only way forward for these kind of volumes of data, particularly where the data aren't required online or even nearline all the time.
"before long, Holmes expects the world will come knocking with piles of LTO-1 and LTO-2 cartridges for re-platforming."
Doesn't sound like much of a problem to me. An LTO (ultrium) drive can read (at least) the previous two generations tapes.
LTO is now on Generation 5, soon to be 6.
So, even with your comment, those LTO-1 and LTO-2 are not gonna be readable...
That would be true if you do a rip and replace on your backup/archive infrastructure with LTO5 or 6... but do people really do that?
I've moved to LTO4 but still have a few LTO2 tape drives handy just incase. If and when I move to LTO5 or 6 I will keep LTO4 drives handy (we don't have any LTO1 tapes so no need to keep the LTO2 drives) and will still be able to read all of the LTO gen tapes we use, until such time as I no longer am required to access the older tape formats.
LTO may be about to move to gen6 but gen2 drives are still readily available to buy.
Unfortunately, some do. I have a stack of LTO-1 tapes on my desk waiting for one of the remaining two LTO3 drives to be availible for me to 'borrow' so I can read the data on them and write them back to something that our shiny new LTO-5 vault will read. There's also the issue of personnel changes- said tapes were sent off site before I started working there, so bog only knows what I'll find on them.
Some of the off-site media storage companies also offer media migration as a service- I toured the facilities of one of the better known ones in the US, and they had a nice 'working museum' of older drives for such a purpose.
They're going to spool some old tapes onto new tapes, which will to all intents and purposes be equally out of date in 10 years or so ?
Who's the asshat that thought of this then ?
Ever tried recovering the data off a hard drive whose head has *physically* crashed?
Now think about the hardware that *every* drive has and how many parts have to fail before that data becomes unreadable, without a *lot* of repair work.
And lets not forget the Indian power grid is prone to *wholesale* failure. An 11PB disk farm will take some UPS to shut down gracefully
And if it's an *archive* then sub second access times are not in fact a requirement.
Well done on a tough data migration project
"Ever tried recovering the data off a hard drive whose head has *physically* crashed?"
Does that even happen any more?
I have not-very-fond memories of the smell of nail varnish triggering an inevitable outing for the oscilloscope.
@Peter: What's your solution then?
Who's the asshat that thought of this then ?
Got a better idea? If not, the choice is to punt the problem into the future without destroying any data, or to give up and hire some skips.
Yes. I have a better idea.
So does the US library of congress. Check it out. There's even a whitepaper ! You'll be amazed. They have discovered something called an 'optical disc'. But keep it quiet, or Apple will call it the iArchive and charge us for using it.
Point of my remark being, though : WHY would you replace a technology KNOWN to give you headaches every 15 years by the SAME technology.
If the bloody tapes are so good, do a deal with IBM allowing you to fabricate the drives yourself if and when you need them. I'm sure India has sufficient know-how to produce a tape drive if they put their minds to it.
It's fascinating that the world generates so much information!
I couldn't help thinking why they need to archive quite so much data going back so long. The following story at Spectrum Data might help to explain: http://www.spectrumdata.com.au/total-data-management-solutions/oil-gas-data-management/geophysical-data-services/client-success-story---an-indian-odyssey
Another big data-producer is CERN, which generates up to 1 Pb per second (http://www.theregister.co.uk/2012/06/14/cern_cloud_helix_nebula/). They delete most of it, but still have to store hundreds of Pb. See also http://en.wikipedia.org/wiki/Petabyte for other examples of Petabyte usage.
I did wonder what it all was. Geophysical survey data explains it. You could compress it heavily, but you'd probably end up throwing away the information that future interpretation exercises might need!
Nevertheless, I wonder if anyone investigated the possibility of lossless compression of geophysical datasets, on the fly between the old and new archives? (I don't mean of a single data stream, I mean by removing redundancy from outputs of multiple sensors and shots in a single seismic run).
I've done some work with IBM RTC (realtime compression) systems recently and they manage to get about 50% compression of oil industry survey data, which is basically the same thing. This is using the standard zip algorithm, which is also hardware implemented in most tape drives. I doubt you'd get anything much better, certainly nothing much better an sufficiently well understood that you'll be able to get it back in the next 10-20 year refresh cycle.
Haven't they heard of Dropbox?
Surely they could have just dumped the data into a BitCasa all-you-can-eat storage account. :-)
Or just sign up for a lot of Gmail accounts
Given the vast costs and general pain in the arse of all these solutions, is it not viable to build a device capable of reading your problem media but which can interface with current computing devices?
This story put me in mind of that time when the BBC were moaning about the laserdisc archive they created that was supposedly going to be lost because no one makes Laserdisc players any more. Er, they don't, but they could do again. The choice is between finding survivor devices and hoping they work, producing a new version of the device or losing the info. If losing the info is not on, then why should an ancient, dusty player/reader be the only recourse? Once built, you can rent/sell it to other projects, etc..
The problem is not that the drives don't work, although they are getting harder to get hold of, but that the tape media are nearing the end of their working lifespan. If you could get the money together to make a magstar drive - and it would be seriously eye-wateringly expensive, what with the machines to make them no longer existing - you'd still only solve a small part of the problem.
"you'd still only solve a small part of the problem"
That's how it is now, but it is a bit of a short-sighted statement
Tapes are what they are - if they don't survive one last hurrah, then you were bollocksed anyway. Of the problems you can actually have a bash at solving, the drive is a big one. More reliable hardware means a greater chance of success. The less hardware there is around and the older it becomes, the better the option to build new becomes. Eventually it becomes almost all of the solvable problem.
I was merely trying to think of a way to spread the cost differently than it is now, and to see if I got shot down in flames for daring to suggest it!
Back some years ago, I encountered a reel of acetate based magnetic recording tape that probably was made in the 1950's.
That stuff did not even survive being played at a slow speed of 3-3/4 inches per second.
The binder was so badly dried out that the oxide coating literally flaked off on the playback head. The transport deck was just coated with oxide. Fortunately, this particular reel did not contain anything of importance; but it made it clear that any attempt to recover material stored on other tapes must be done by people skilled in the art.
The motion picture industry has the same problems with old film masters.
Seems like a decent use case for the SGI/COPAN MAID with ~2PB/rack,, that's just about 6 racks for all their data, looks like peak power is around 6.5kW/rack which is not much.
Just make sure you install it on a solid floor, topping out at almost 3200 pounds per rack it's the heaviest loaded rack I've ever seen myself.
I bet the cost of the system would be more than SGI acquired the assets of COPAN for..
110000 times 60GB is 6.6PB, not 11PB. Oh, wait: my guess is that this is the infamous assumption by tape vendors that you can compress all data by 2:1.
They're making 4TB drives now so 6.6PB is only 1650 drives. This doesn't seem like a ridiculously large array to me: you could fit in in five racks easily. Surely in backup/archive applications the drives could be powered down almost all the time with individual drives powered up as needed. The biggest win is that these drives can be accessed with just a standard SATA cable on just about any computer, compared to tape which needs an expensive drive, even more expensive software and probably SAS interconnect and HBA.
The only concrete win I can see for tape over disk is that it tends to be designed for archival purposes with consideration given to media life out to thirty years.
The missing tidbit (and perhaps the only interesting one) is what software is handling these 110k cartridges?