The Internet Archive's "Wayback Machine" has announced that it has indexed four hundred billion web pages. The trove dates back to late 1996 and comprises at least fourteen petabytes, a figure we base on a 2012 declaration the archive hit 10 petabytes and a later post explaining that a fund-raising drive for another four …
4 Billion pages...
Let's hope that they didn't use 32-bit unsigned index values or they may find themselves more way-back than they planned!
Re: 4 Billion pages...
Four hundred billion...
Can't remember visiting El Reg back then, but like most people who had anything online in the early days, it's a blast visiting stuff that no longer exists some of mine from '94 was still around for the earliest snapshot). I like the roll-back feature, showing how things have changed over the years.
Understatement of the year?
Who had more beer last night: me, The Reg, or the Wayback Machine? Their announcement says FOUR HUNDRED BEEELLION pages. or at least that's what I saw. Twice.
I think I first read El Reg in either 1999 or very early 2000.
(Think it looked fairly similar to how it is now).
I miss low res animated gifs
Ahh, those pre-flash days, twirly gif logos all over everyone's page because they were 'dynamic' and 'cool'...
* goes all misty eyed*
Re: I miss low res animated gifs
Not to mention the yellow and black striped sign with "UNDER CONSTRUCTION" emblazened all over it.
... don't forget little graphics of envelopes with "No Junk Mail" on them, right under a plaintext email address ... because that stopped spam in its tracks ;-)
even shows how El Reg looked as a young vulture in the summer of '97
I prefer that version!
HMTL of yesteryear
How we've missed you.
Re: HMTL of yesteryear
How we've missed you."
You only miss it once it's gone. No really, no need to reimplement... please...
Tip-top technical guidance from Ye Olde El Reg
"Use the back key on your browser to return to the previous page..."
You have to wonder who the target audience were!
Remash of old material
Ah ha, now we have proof that the El Reg hacks simply remash existing material.
El Reg hack replaces Win 98 for W8, changes one or two names, drops in the obligatory threat from company X who is "prepared" to move to an alternate OS et voila Bob's your auntie.
Bill Gates was already the worlds richest man, for the 4th year running with an estimated 51 Billions - so we can easilly remash that as well....
Interesting to see how nothing much changes.
Ahh the wayback machine
How many times have I used ye to retrieve old content long since deleted from our servers?
Praise be to the wayback machine!
It's a sad testament to the state of the world...
...that "Gates Owns Even More of Everything -- Official" is now the Good Old Days.
I had a friend show me an odd 'bug' in the Wayback Machine once - he bought a domain and set up his own personal website on it, which as it turned out had already been owned several years previously to his purchase by a small foreign telecoms company.
The TelCo had a blanket denial robots.txt file which told all spiders to F- off, and because of this, the Wayback Machine would refuse to allow him to browse the historical snapshots of the domain during the time he owned it, despite indexing his site according to his web traffic logs.
I just shudder at what Wayback Machine holds on me - I can see my very first websites thanks to the history, back when I did terrible things like build websites in Lotus Word Pro (which was marginally less of a sin than building them in Word).
I just hit the same 'feature'. My domain goes back on there to 2000, yet I can't see it due to the robot.txt thing. I just changed my robots.txt to allow all, and it still refuses to show me the archived pages.
So I don't understand how/why this works like this?
Seems there's some discussion on it on the Archive.org forum;
( http://archive.org/post/406632/why-does-the-wayback-machine-pay-attention-to-robotstxt )
Doesn't appear to be any sensible consensus on what they should do to fix this... but this is totally off-topic for this article. :)
Pirate flag, as I've partially hijacked the topic! (we need a tangent icon).
Love the Wayback Machine!
After several PC upgrades, I eventually lost the files that my were first websites I had built back in the gay 90s. Some of them quite good!
Gone for ever I thought. Oh well.
Spent a few years years trolling the Internet Archive thinking they might show one day. After many, many years, low and behold! They did!
Now when I tell people I used to build websites back in the early days of dinosaurs, fire and stone wheel, I now have proof and you know what, they still look pretty good.
Here's one sample. Check the date. https://web.archive.org/web/20010406065812/http://www.worldtv.org/index.htm
Flame on. :)
Re: Love the Wayback Machine!
Check out http://www.fabricland.co.uk/ - it's like playing bad web design bingo!
"New Page 1", Framesets, pointless gifs, horrific colours, marquees, table layouts, center aligned text, broken links, personal drawings/quotes unrelated to the site... the list is almost endless!
... I hope that site never changes... it's a fantastic example of everything bad *and* it's an actual live site! :o
I wasn't a reader then, but I have to say that the early logo still looks quite stylish incorporating as it did both the 'R' and the vulture.
Yes, a definite 'like'
Old Copies of "As The Apple Turns"
appleturns.com was responsible for many a noser back in the day. Where is Jack Miller?
- Vid Google opens Inbox – email for people too thick to handle email
- RUMPY PUMPY: Bone says humans BONED Neanderthals 50,000 years B.C.
- Pic Forget the $2499 5K iMac – today we reveal Apple's most expensive computer to date
- Geek's Guide to Britain Kingston's aviation empire: From industry firsts to Airfix heroes
- Review Vulture trails claw across Lenovo's touchy N20p Chromebook