back to article Inside Internet Archive: 10PB+ of storage in a church... oh, and a little fight to preserve truth

At the Internet Archive's headquarters in San Francisco, California, on Wednesday, technologists, educators, archivists, and others fact-oriented folks gathered to discuss how they and the like-minded can save news from the memory hole – a conceit conjured by George Orwell to describe a political mechanism for altering the truth …

Silver badge

Truth vs Sheep

Securing the truth is one thing. Getting the sheeple to believe and act on it is another. The U.S. political ecosystem alone is living proof of that. 1984 was supposed to be a work of fiction, not an instruction manual.

30
2
Silver badge

Re: Truth vs Sheep

If you consider a massively contaminated cesspool an ecosystem, you'd be correct. Our current political climate is a lot like Lake Karachay.

My grandad always said that they're all a bunch of crooks. I used to love listening to him rail at dumbass local politicians who'd call him and he'd always finish with "Y'know what? You're all a bunch of crooks". He was from Chicago and it was indeed true, they were literally all a bunch of crooks, especially there and especially in the era he grew up in.

However after the past two years, I have to amend Grandpa's wisdom. They're too inept anymore to be crooks. They're all a bunch of fucking yes-man clowns led by psychopaths on both sides with no original ideas in their heads besides threatening to bomb people.

22
0
Anonymous Coward

Re: Truth vs Sheep

Looking at Aldous Huxley's and George Orwell's backgrounds, you might well conclude that neither of their works were "speculative fiction", but actually "announcements".

- A lot of sheeple have an inkling of what's wrong, but are far too afraid to act on it.

- Other sheeple have an inkling, but refuse to entertain it, because they know it would turn their lives upside down.

In the end, we are all responsible, because our own small scale corruption enables the huge scale corruption of stealing the entire world from us, to which they are getting pretty damned near. 20 banks, 140 corporations and a smallish number of shadowy figures controlling them globally.

If there weren't so many people who's compliance you could *buy* with small change, they couldn't recruit a sufficient amount of helpers-helpers to pull off this worldwide plan of fraud, mass murder and thinly veiled slavery.

0
0
Silver badge

Loophole

Does it still take notice of present day robot.txt files which can be used block URLs from the past?

That'a not a particularly good decision.

6
0
Bronze badge

Re: Loophole

You have to listen to robots.txt some links are like /delete and other links that website owners dont want followed by spiders not necessarily because it is hidden pages. If you want to hide data from spiders there are easy ways to do it that dont use robots.txt. breaking robots.txt would be a bad idea.

1
0
Silver badge

Re: Loophole

Spiders aren't really relevant.

The archive is to preserve what was visible, and to prevent the government changing the whether we were at war with Eurasia or Eastasia. It doesn't matter if they remove the old page or block it with spiders - as long as the archive copied what was visible

4
0

Re: Loophole

The issue is applying a current robots.txt to an archived version of a site.

For example: Try viewing the official Chevrolet Monte Carlo website from 2006 on archive.org. It reads the current Chevrolet.com robots.txt and disallows access to archived content.

It has nothing to do with current web standards and is solely about how the way back machine handles robots.txt in relation to previously scraped and archived content.

3
0
Mushroom

More of this sort of thing

I love it when formerly religious buildings are purposed (I was going to say repurposed, but the prefix seems redundant to me) into something useful, like this. Or, say, pubs.

31
13
Go

Re: More of this sort of thing

Yes: cafes, concert halls, art venues... And of course data centres in those nice, cool, secure crypts.

5
0
Silver badge
Angel

Re: More of this sort of thing

ISTR that Christ himself was well known for his habit of consorting with the "publicans and sinners". I am sure He would approve of your sentiments.

10
0
Silver badge

Re: More of this sort of thing

Christian Science Reading Rooms, to be accurate, were mostly build in an era of cool architectural design. Cadogan Hall in London UK, now turned into a performance hall with its own resident orchestra, has a beautiful auditorium and brilliant acoustics.

7
0
Silver badge
Happy

Re: More of this sort of thing

"ISTR that Christ himself was well known for..."

Jeez, your'e old!!

7
1
Silver badge

Re: More of this sort of thing

To be more accurate.. it's a former Christian Science Church they use. The Reading Rooms are a different critter and usually pretty small compared to the churches themselves.

2
0
Anonymous Coward

Re: More of this sort of thing

Vilnius has a museum of atheism in a former church. Seems sensible to me.

4
0
Silver badge
Thumb Up

Interesting that they keep a mirror at the modern reincarnation of the place that centralizing most of the Greco-Roman world's knowledge at that one location paid such great dividends.

Good on them though, someone has to do what they're doing.

I especially like the idea of the PDF readers automatically linking the cited papers in the footnotes. I have to read a lot of academic writing since emergency preparedness and response is in a constant state of evolution, and that would save a lot of time and there would likely be less time wasted getting pissed off at Elsevier and the rest of the academic publishing cartel if I could see which journal whatever paper is in (and if we pay for it or not) based on the linked URL alone.

12
0

ITYM "and if it's accessible via SciHub".

Easy to automatically find PMIDs and DOIs and link them straight there as well...

2
0
Silver badge

distributed knowledge?

A few months back we read about a whole bunch of early hp documents that were lost to a natural disasters (fire from memory). It strikes me as quite all eggs in one basket to have such important historical data in one location. How do they backup their data? I know many folk here have a few 10s of GB HDD space. It would be a really interesting project to ask people to donate a few GB storage and a small amount of download/upload bandwidth to truly securing that data. If sharded the right way, you could reasonably have confidence that all information is held in multiple regions, detect where backup nodes are MIA and replicate the at risk data to new nodes.

4
2
Silver badge

Re: distributed knowledge?

About a year ago they were talking about setting up a full backup in Canada.

1
0
Silver badge

Re: distributed knowledge?

I would be happy to donate spare capacity. I am honoured that my very first website, 1998, has several iterations on the Wayback Machine.

10
0
Bronze badge

Re: distributed knowledge?

I was thinking that also. This is very much how freenet works.

1
0
Silver badge

Re: distributed knowledge?

I've got you beat by two years... my first iteration hit the WBM on 20 December 1996... :-D

2
0
Silver badge
Thumb Up

Re: distributed knowledge?

If you'd like to help keep an additional backup of the Internet Archive (there are several already, they're not daft) there's a project called ia.bak which uses git annex to store a copy of part of the data.

All you do is decide how much disk space and bandwidth you can spare, and then you can just walk away and leave it.

5
0
Silver badge
Pint

Re: distributed knowledge?

My first idea is that your valuable data can be watermarked into pr0nography files (e.g. naughty videos), and then uploaded to the 'net. Within seconds, dozens of thieving/freeloading pr0n servers will steal copies of these files, and host them on their own pr0n servers for fun and ill-gotten profit. So your valuable data, secretly watermarked into the files, will be widely distributed and publicly available. It's really the ultimate free, distributed, crowd-sourced backup system. It'll almost certainly survive nuclear war and asteroid impacts. And it justifies smurfing pr0n during working hours, 'cause ya know, "...just checking the backups."

My second idea is that precisely all this has already happened. Which would explain a great deal.

9
0
Silver badge

Re: distributed knowledge?

I'm happy to be downvoted but at least make a point about why my post is wrong or stupid or RTFA or something.

@phuzz, thanks for the link. It's good to see they are at least making the right noises. I think it's a bit generous to call it an "all you do" set of instructions. Most commentards here could do it but it is hardly folding@home or seti@home level accessible. There is a lot of focus on the great backup but potential distributed restore plans don't seem as developed. Bad actors are mentioned in passing but not strategies to figure out which is truth when for example a TLA pretends to be multiple actors and restores a different truth.

This would be an interesting application of blockchains or even with as a cryptocurrency. Imagine mining by proving that you have the hash of hundreds of random files from random places in the archive.

1
0
Silver badge

Re: distributed knowledge?

Doesn't ipfs.io do this?

Of course the danger with that is is that it's a start up so it's transient.

0
0
Silver badge

Archive vs right to be forgotten

How do we square off the two ?

5
2
Silver badge

Re: Archive vs right to be forgotten

There is no right to be permanently forgotten - ask any archaeologist. A data protection filter during a person's lifetime would surely be a good thing, but it would require massive administrative overhead with affected people arguing over what items should be blocked/unblocked, I don't know if it would be feasible.

11
1

Re: Archive vs right to be forgotten

I broadly agree with you that, as I conceive of natural rights, there is no absolute right to be forgotten. However, this does not mean that there are no legal problems to address: with the EU General Data Protection Regulations coming into force this spring, you will in fact have a general right to remove records from any organization storing your personal information (with some obvious exceptions for e.g. active business relationships and security). The usual tricky question of jurisdictions then raises its ugly head.

In the very longest term, we will of course all be forgotten. Isn't that comforting?

9
0
Silver badge

Re: Archive vs right to be forgotten

"A data protection filter during a person's lifetime would surely be a good thing, but it would require massive administrative overhead"

The best filter is the one that lies on the proximal side of the user's fingers and requires no overhead, just a head.

2
0
Silver badge

Re: Archive vs right to be forgotten

There is no right to be permanently forgotten - ask any archaeologist. A data protection filter during a person's lifetime would surely be a good thing, but it would require massive administrative overhead with affected people arguing over what items should be blocked/unblocked, I don't know if it would be feasible.

I wondered what the difference between grave robbing and archaeology was. Someone did give me a definition which was basically Archaeologists don't hang onto their finds for profit Grave Robbers do.

I think the Time Team folks are safe under that definition.

I've managed to find a friend who I had lost contact with using archive.org. His business contact details were on his website which he deleted a few years back. Wouldn't have found him so easily otherwise.

Whilst they do try to cut out the pr0n sites there are a few on there and hence archive.org is blocked in a few places where they're paranoid about pr0n. I once visited a company where they had blocked access via their internet connection to most adult sites but also Flickr, Instagram, Twitter, Dailymotion, Youtube etc. because they were trying to ban pr0n. A staff member told me it was a massive overkill but they were acting on the advice of lawyers.

1
0
Silver badge

"copies of its data out of the US, because it's good to have an offsite backup."

I would say that it is absolutely vital to have copies of the data outside the USA. And not just with the current regime. The Snowden revelations show clearly that for instance the NSA would have no qualms whatever in hacking in and changing the data, or simply ordering them to change it and forbidding them to say anything about it.

11
0
Silver badge

NSA & Internet Archive

the NSA would have no qualms whatever in hacking in and changing the data, or simply ordering them to change it and forbidding them to say anything about it.

Correct: so any second copy must be more than a backup copy of the ''master'' in the USA. It must have a certain amount of USA-hands-off autonomy so that it would verify updates from the USA and also scan web sites independently so that it is not blind to the sites that the USA government/judiciary says that the USA archive must not see.

The big question is where to place the second copy ? The UK is likely too close (politically) to the USA, much of Europe is not a huge amount better. I have China and Russia popping into my head; sure they will censor things but likely in a different way than the USA/Europe.

Why stop at two backups, if funds allow the more the better.

7
0
Silver badge

Re: NSA & Internet Archive

Perhaps all the more reason to have the data distributed across a few million people's spare capacity?

3
2
Silver badge

*cough* Yes, sure... *cough*

"The Internet Archive isn't so much concerned with preventing the spread of misinformation as with making sure information of all sorts remains accessible."

Sure, because in 500 years from now when the domestic cats are evolved into our sentient upright overlords, and when we are slavishly subservient to them (even more so than we are already), it's good to know they'll have somewhere to go to revisit their own historical records.

7
1
Silver badge

Re: *cough* Yes, sure... *cough*

And how does this differ with the situation between cats and humans now? They don't even have to be upright. Why bother, when they can be served when at their ease?

14
0
Silver badge

Re: *cough* Yes, sure... *cough*

Sure, because in 500 years from now when the domestic cats are evolved into our sentient upright overlords, and when we are slavishly subservient to them (even more so than we are already), it's good to know they'll have somewhere to go to revisit their own historical records.

Obligatory Sir Pterry Pratchett quote:

In ancient times cats were worshiped as gods, they have not forgotten this

2
0
Silver badge
Angel

Essential service

I have to say I already find the Wayback Machine an essential service. I blush to recall that I have even retrieved my own stuff from it on occasion, when my backup system failed.

Top marks to these geezers, I wonder if they are archiving Wikileaks?

Somewhere I came across a religious cult which regards information as Divine (taking "God is Truth" quite literally) and its destruction as a sin. Not so much a defunct Church as merely a change of religion, then. I can live with a God like that.

9
1
Silver badge

Re: Essential service

Isn't God the Word (Logos)?

3
0

Re: Essential service

No, The Bird Is The Word,

https://www.youtube.com/watch?v=2WNrx2jq184

8
0
(Written by Reg staff)

Re: Essential service

If it's "essential" do you mind that it's partial, and succumbs to corporate pressure? Or is "full of holes" good enough?

https://forums.theregister.co.uk/forum/1/2017/11/16/head_like_a_memory_hole/#c_3349090

"a religious cult which regards information as Divine "

The cult is contemporary and Swedish, but information worship goes back to Gnosticism. After Comte ("Religion of Humanity), there were a number of religions of positivism. One disciple was Teixeira Mendes who put "Order and Progress" on the Brazilian flag.

5
0

Re: Essential service

@Stuart Castle - thank-you. Brilliant. I needed cheering up and that hit the button. What a classic.

0
0
Anonymous Coward

Agreed, and ElReg also should stop modifying posted articles without warning.

That is all.

4
0
404
Silver badge

This is why I collect encyclopedias and old books - they tend not to change when the wind blows from different directions*.

*Russians obviously lol /s

4
2
(Written by Reg staff)

Brewster and Memory Holes

"We don't see people trying to modify the records that we've stored," Kahle told The Register.

Archive.org seems very happy to modify the record itself. How do I know?

Back in 2003, when Carly Fiorina as CEO, HP requested the deletion of material it found embarrassing, and Archive.org happily complied. I recall this made things difficult for us journalists to corroborate previous statements, and so hold the executives to account.

So I find the Memory Hole competition richly ironic. Archive.org *is* the memory hole.

Real archives have exceptions for copying and preservation, and the kind of threats HP made could be ignored. Don't mistake Brewster's collection for a real archive.

10
0
Silver badge

Re: Brewster and Memory Holes

Thank you for this. It's like when \i found out that the British Library had thrown away a lot of books and old periodicals. Gutted my trust in them forever.

2
0
Silver badge

Re: Brewster and Memory Holes

Back in 2003, when Carly Fiorina as CEO, HP requested the deletion of material it found embarrassing, and Archive.org happily complied. I recall this made things difficult for us journalists to corroborate previous statements, and so hold the executives to account.

So I find the Memory Hole competition richly ironic. Archive.org *is* the memory hole.

I wonder if, in the 14 or so years since, they've had a change of heart?

1
0

Whose fault is it?

Does anyone else think it odd that this "forever archive" is located in a building on a major fault line?

4
0
Silver badge
Happy

Sounds fine to me

"... we don't archive Facebook very well."

3
0
Coat

Re: Sounds fine to me

"... we don't archive Facebook very well."

Yes, but it FriendFace went away, nothing of value would be lost.

(Except the Cuke game, that was awesome.)

3
0
Bronze badge

35PB ? What about the Internet?

I know one company whose data is measured in a larger scale than that. There is no way that these guys are archiving anything beyond a thin slice of the net. This explains why I wasn't able to pull comments from an old website--they just never archived it.

1
0

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Forums

Biting the hand that feeds IT © 1998–2018