4 Billion pages...
Let's hope that they didn't use 32-bit unsigned index values or they may find themselves more way-back than they planned!
The Internet Archive's "Wayback Machine" has announced that it has indexed four hundred billion web pages. The trove dates back to late 1996 and comprises at least fourteen petabytes, a figure we base on a 2012 declaration the archive hit 10 petabytes and a later post explaining that a fund-raising drive for another four …
Can't remember visiting El Reg back then, but like most people who had anything online in the early days, it's a blast visiting stuff that no longer exists some of mine from '94 was still around for the earliest snapshot). I like the roll-back feature, showing how things have changed over the years.
Ah ha, now we have proof that the El Reg hacks simply remash existing material.
El Reg hack replaces Win 98 for W8, changes one or two names, drops in the obligatory threat from company X who is "prepared" to move to an alternate OS et voila Bob's your auntie.
Bill Gates was already the worlds richest man, for the 4th year running with an estimated 51 Billions - so we can easilly remash that as well....
Interesting to see how nothing much changes.
I had a friend show me an odd 'bug' in the Wayback Machine once - he bought a domain and set up his own personal website on it, which as it turned out had already been owned several years previously to his purchase by a small foreign telecoms company.
The TelCo had a blanket denial robots.txt file which told all spiders to F- off, and because of this, the Wayback Machine would refuse to allow him to browse the historical snapshots of the domain during the time he owned it, despite indexing his site according to his web traffic logs.
I just shudder at what Wayback Machine holds on me - I can see my very first websites thanks to the history, back when I did terrible things like build websites in Lotus Word Pro (which was marginally less of a sin than building them in Word).
Seems there's some discussion on it on the Archive.org forum;
( http://archive.org/post/406632/why-does-the-wayback-machine-pay-attention-to-robotstxt )
Doesn't appear to be any sensible consensus on what they should do to fix this... but this is totally off-topic for this article. :)
Pirate flag, as I've partially hijacked the topic! (we need a tangent icon).
After several PC upgrades, I eventually lost the files that my were first websites I had built back in the gay 90s. Some of them quite good!
Gone for ever I thought. Oh well.
Spent a few years years trolling the Internet Archive thinking they might show one day. After many, many years, low and behold! They did!
Now when I tell people I used to build websites back in the early days of dinosaurs, fire and stone wheel, I now have proof and you know what, they still look pretty good.
Here's one sample. Check the date. https://web.archive.org/web/20010406065812/http://www.worldtv.org/index.htm
Flame on. :)
Check out http://www.fabricland.co.uk/ - it's like playing bad web design bingo!
"New Page 1", Framesets, pointless gifs, horrific colours, marquees, table layouts, center aligned text, broken links, personal drawings/quotes unrelated to the site... the list is almost endless!
... I hope that site never changes... it's a fantastic example of everything bad *and* it's an actual live site! :o
Biting the hand that feeds IT © 1998–2019