I have a few gigs spare
Can't they do a crowd source thing like they did with distributed computing?
I guess some clever data duplication methods might help to make sure all the data is available all the time...
The world's hard drive shortage, caused by deadly flooding in Thailand, is holding back CERN's antimatter research, a top scientist at the boffinry nerve center said last night. Analysis of figures spewing out of the Large Hadron Collider was compromised by a lack of storage space, said Peter Clarke, who works on the CERN LHCb …
Nice idea, but my quick and dirty estimates say that if they wanted to crowd-source 1 Petabyte and associated calculations using distributed PC-type equipment; then they'd need 10 million volunteers and the public networks would take a big hit on bandwidth.
It's still a nice idea :)
CERN & NSA -> All the data never gets seen again.
CERN & Google -> Google gets to index the raw structure of the universe - what could go wrong?
CERN & Amazon -> You now get recommendations from Amazon on what elementary particle other people have been using.
CERN & NASA -> the data gets translated into imperial format and then gets lost.
CERN & Mirosoft -> ooh where to start.. Microsoft offer to reformat the data, and from then on you need a succession of patches to read the data.
CERN & Apple -> The data gets formatted to make it look really pretty, but no-one else can read it.
hmm - who am I missing?
ttfn
CERN & The UK Civil Service -> The data is on several million USB sticks left behind on trains.
CERN & MPAA -> There will be a delay while DRM is applied to all existing data.
CERN & PC World -> For a small fee, the data comes with an extended warranty.
They flat out state that they have more than enough Processing and network capacity, so why not shift some of that to *compressing* that huge gob of data they produce. Or maybe spend a few buck on de-duplication technologies, I am pretty sure that would remove a huge amount of the storage needed.
They only record 'interesting' events, and they store the events gzipped, IIRC. I may be wrong because I left the field a few years ago but I think the issue is getting enough storage at all the replication sites to allow efficient analysis by all the physicists on the collaboration, not that they can't afford the storage for a single copy of the events. That would be nuts, clearly.
Throwing a few megabucks at HP to get their memristor chip stacks working?
That would solve the storage problem, just dump the raw data directly to MR-Flash then do the processing as needed.
Alternate ideas, get people with PS3s/etc to do the preprocessing like Seti@Home, and locally cache the data for them.
If it gets lost no biggie as its duplicated across multiple machines.
Each PS3 does a different subset of the task on the same data block, and they are all reconstructed at the end.
Simplez.