Re: Rince and Repeat.
Rincewind? I'm sure Terry Pratchett's books will still be around in 100 years time.
Cyber-pioneer Vint Cerf has warned – once again – that our digital lives are in danger of being wiped from human history. Cerf, who was speaking at the American Association for the Advancement of Science annual meeting, reiterated calls for a "digital vellum" – referring to the ancient parchment made from calf skin and known …
I can see the problems that arise from the volatile nature of digital records. It's much easier to lose or destroy than paper. But on the other hand it's much easier to duplicate, so the net attrition may be the same or less.
What surprises me is the suggestion that the ability to read old digital data may be lost in the future. Is there any evidence that this happens? If I dig out a 40-year-old CP/M floppy, will it be impossible to read (in the unlikely event that there's anything on it worth reading)? The effort and ingenuity that go into reading historical material like the Dead Sea scrolls and carbonised documents from Herculaneum suggest not.
It's a bit like the distinction between digital and analogue computers. Reading ancient digital information is presumably a matter of emulating the obsolete technology; a tricky but quantifiable problem. Ancient analogue information presents much greater challenges.
Old CP/M disks may be difficult to read, but now that everything is networked it is easy to transfer the bits to newer storage systems (replicated as necessary) over and over again.
We don't have to worry about preserving the data, but we do have to worry about preserving the means of interpreting the idea. i.e. if we store JPEGs we need the method of displaying them saved, similar with h.264, HTML5 and so on. That's probably not a difficult problem for common formats like that, for rare formats like saved mail archives from Domino that might be more difficult.
The overarching problem is preserving some way to determine what is what. We need metadata about each object or collection of objects to tell what it is, where its from, what its purpose was, who originated it, what its significance is, etc. A giant dumping ground of cat videos from 2006-2030 isn't very useful. A way to search it to fit memes like grumpy cat, I can haz cheeseburger cat and whatever at least preserves its cultural context/significance.
Maybe AI will eventually be able to help there, and be able to 'watch' all the videos, view all the web pages, look at all the pictures, read all the documents and emails, and categorize them in a useful manner. Google's current search ability is nowhere near what we'd need for a future historian to look through this stuff and make some sense out of it.
"view all the web pages, look at all the pictures, read all the documents and emails, and categorize them in a useful manner."
Surely that's what Autonomy was sold for?
And other less well known products used by three and four letter agencies, "for our security".
"view all the web pages, look at all the pictures, read all the documents and emails, and categorize them in a useful manner."
Surely that's what Autonomy was sold for?
There are quite a few companies operating in this area, many of them for specific markets or industries (legal and medical are the big ones). And there's free software you can use to build your own system, from simple stem-and-index systems like the old MIT Savant software to general frameworks for processing unstructured data like UIMA.
Even Windows, in its default configuration, will scan and index all the file formats it recognizes. Similar systems exist for Linux and UNIX.
I searched in WWW.Archive.org - 30 posts/video's/interview by Vint Cerf, including 2 Previous ones ABOUT this topic .... https://archive.org/search.php?query=Vint+Cerf
But if anyone can't operate a emulator, try the FREE browser version ...
https://archive.org/details/internetarcade
In a small way it's already happened. I worked on a large corporate document project in 1990 and 1991. The documents were written in Microsoft Word 1.0. Graphics were created in MicroGraphx Designer. Some graphics were create in tools (names unremembered) running on DOS. None of this material is usable today. The latest version of MS Word doesn't recognise these old DOC files. There is no support anywhere for MicroGraphx Designer files. The DOS software is long gone. I still have printed copies.
There's also the point that we may not know what will be interesting or relevant in the future.
So when people go to the British Library to see the four remaining copies of the Magna Carta, now 800 years old, they might like to wonder whether current "digital only" records, personal or corporate, might be readable by someone in the year 2815!!!! In my experience, it's only taken twenty-five years to render some digital files close to unrecoverable.
"In a small way it's already happened. I worked on a large corporate document project in 1990 and 1991. The documents were written in Microsoft Word 1.0. Graphics were created in MicroGraphx Designer. Some graphics were create in tools (names unremembered) running on DOS. None of this material is usable today. The latest version of MS Word doesn't recognise these old DOC files. There is no support anywhere for MicroGraphx Designer files. The DOS software is long gone. I still have printed copies."
Are you SURE none of that is useable today? Are you sure you can't fire up a DOS emulator like DOSBox, locate disk images of the software you used (OK, maybe some of it was custom work) or a utility from the time capable of interpreting it? Sure, formats come and go, but there are even now digital preservationists striving to at least keep records of the past available: diskettes imaged and formats described. The hard part is gathering the resources needed to read your old format. After that, you can usually migrate it to a newer format. Plus there are certain formats (like simple text files) that lend themselves better to preservation (as long as the character set is still known, you're OK).
Obviously that stuff CAN be recovered, it just isn't worth the cost. If they contained proof of invention that would mean they win $100 million in a patent case, they'd spare no expense to recover them - even taking "unreadable" diskettes to a data recovery specialist to read off the bits using a STM or whatever.
This reminds me of the BBC Domesday Disc project. A digital record of life in Britain in the mid 1980's stored on laser disc for posterity and like the Domesday Book, useful to social scientists and historians for hundreds of years. Or as it turned out, about 5 years.
I know there is a site on the intertubes that allegedly re-creates bits of the disc but I remember playing with the disc in the school library and looking at my home village - not present on the interweb emulator thingy...
Part of the problem is that copyright was not specifically waived, so while conversion could be done, from a legal POV copies are not allowed without contacting the original sources for the information:
https://en.wikipedia.org/wiki/BBC_Domesday_Project#Preservation
He said that personal privacy was essentially over because of social networking.
He also said that he would have liked to make the Internet more secure against interception, but wasn't allowed to as he was, er, working on a secret program on behalf of the NSA.
And now he wants to record for ever the information he thought should remain private and secure?
The only thing standing between us and the perpetual violation of our privacy for generations to come is bit rot, so long may it continue.
There's only any point in archiving stuff that the small number of people who might have the interest and resources to trawl through it in the distant future will have the time to read. So someone might as well sit down and copy the best bits out onto physical vellum - we already have far more historical archive material than will ever be usefully consulted, we don't need to expand it by orders of magnitude every few years.
Thank heavens there's still a few influential people around in IT like Cerf and Berners-Lee who have a real grip on the big picture and say what's needed to be said.
What Cerf says about digital vellum ought to be self evident but it's not to many. Why it's so few is matter of conjecture.
Unfortunately, whilst Cerf has the power to command attention, I doubt if he will be listened to, as such matters require more than a five-second consideration.
I don't think it's particularly self evident - he sounds like chicken little to me.
The proportion of the Internet worth keeping is tiny. The formats involved number in the low thousands the technologies in the same order of magnitude. All that's really needed is some auto tagging so that photos and personal data can be retained and Facebook posts flushed down the digital crapper.
Let's say some document of equivalent 'importance/influence' as the the King James Bible of 1611 had first entered the world not as printed material but on-line as ephemeral 0s and 1s in 2011--exactly 400 years later.
1. Would that document have the same influence over the forthcoming 400 years, everything else equated equal?
2. Would it matter if its influence was more or less influential than its printed-on-atoms version of some 400 years earlier (leaving aside one's religious/political position on its contents)?
3. What does it mean to society/posterity/history etc. that in the digital age that the 'modern ephemeral version' might never make it past five years of age let alone 400 years?
Permit me to suggest that these are quite profound questions that will take substantially more time for society to consider than it takes to cross the road whilst reading one's iPhone--the average amount of time users seem to devote to an issue today.
There were some great discussions on the old Aces Hardware forums. Participants from Intel, AMD, and other companies posting about the design of CPUs, Memory, and other key system architecture components. Some insights were posted there that may not have existed anywhere else.
All lost to a hard drive crash.
Can't help feeling he's missed the point about vellum, which is that it was a reusable medium! You simply scraped off the top layer with a sharp knife and reused it. Examinations of surviving vellum documents today indicate that in some cases much more valuable (in today's terms) items have been overwritten with (e.g.) yet another beautifully illuminated Gospel of St John. Aesthetically more pleasing but actually not as interesting as the local history that was written on there until some Abbott decided it wasn't worth keeping...
Parallels, anyone?
Let's begin with something that ought to be dear to your heart. Vinyl recordings will most likely outlast their digital counterparts.
So far, each successive generation of recordings and (recording formats) has a shorter life than its predecessor. I've commercially-cut DVDs that are now unreadable which are less than a decade old, whilst my vinyl records are still very much intact some of which are over 50 years old. I've also a few 78s that are 90+ years old and are still in reasonable condition.
The issue here is not that vellum can be reused--so can paper with a pencil* and eraser--but whether the data will still be around after a reasonably long time if one wants it to be. Most written stuff from history isn't around today because people either didn't want it to be preserved or they didn't care to look after it.
The issue Cerf is making is that Vellum, if given a chance, will store information for thousands of years, that's much more than can be said for present digital information storage--the reasons for which are either technical limitations or society's lack of concern for the fate of recently-old information, or both.
Whilst in theory digital information can be stored indefinitely, practice is another matter altogether. I've considerable difficulty recovering my own digital documents from the early 1980s even though I've taken steps to look after them. Technology has made it difficult for me to store them either efficiently or transparently; whether they warrant storage is another matter altogether--but it's an issue which Cerf and many others including myself are very concerned about.
(* Wherever possible, I've always used a propelling pencil and eraser in preference to a ballpoint for this specific reason--even today, I don't feel properly dressed unless my 'Cross' propelling pencil and a notebook are in my top pocket. Much experience has shown me that (a) it's still quicker to jot in a notebook than in an iPhone, and (b) that 10-20 years on, that this data stored in human-readable format is more likely to be still about and accessible than its machine-readable cousins.)
'... thermal printer'
I hope you mean the kind of thermal printer where the printing dyes/pigments sublimate onto the surface, these are excellent for longevity. Commonplace thermal paper printers are a disaster. That thermal paper has stuff-all retention time before it fades. Moreover, it's sensitive to all sorts of chemicals and turns black in the presence of cleaners, alcohol etc. A mere whiff of certain household chemicals is all that's needed to do damage--just stored nearby bottles of cleaners will do.
A few years back, I needed to access to some my records for accounting purposes and I had great trouble in reading some documents that were only a couple of years old. They'd faded to the point of illegibility and I had to use the computer entries instead. However, the computer records aren't proof of a transaction whereas the original paper receipt/document is.
Thermal paper and to a lesser extent badly done dye-line printing are the only forms of data retention that I rate less reliable than current-day digital backups.
BTW, I've a stack of QICs of the older variety (recorded on Mountain tape drives). Fortunately, I've not needed the data from these in years, for if I did then I'd be hard pressed to retrieve it other than to resurrect some old DOS machine. Does anyone have a simpler, perhaps more eloquent solution to this problem. (I'm sure I'm not alone in having stacks of old QICs in storage.)
About 6 months ago, I given a lend of a 3d printer to evaluate for somebody, one of the "tests" I did, was throwing a scan (after gimping) of my friends wedding certificate and PRINTING it ( I was attending his Anniversary) , it came out as a a flat slab with raised letters/graphics, it may be able to "Gestetner" off a few copies, but took "forever" to print, and turning a cabinet of file into these, would be a institutional effort (madhouse), not to mention space they would take up ...
PS: Ubuntu has some links with hooking up qic's to a new machine ...
http://ubuntuforums.org/showthread.php?t=2130423
Historically the survival of any particular document has been a matter of chance. Some Anglo Saxon charters survive as originals. Some older documents have survived in particularly favourable environments. For the most part, however, texts from antiquity have survived as copies several generations removed from the original and the more copies were made the greater the chance that one or more has survived.
I don't see that changing in terms of digital texts. Anything posted to Geocities, for example, is long gone unless someone copied it - archive.org doesn't seem to have got it all. If Google decided that Groups should go the same way as Wave how much of Usenet would survive?
If we are to have digital records available far into the future we need to do three things:
Have multiple archives of what is to be preserved*
Each archive needs to copy its material onto new media as old ones become obsolete
In addition to copying material archives need to translate obsolete file formats into current ones**
* What is chosen for preservation is a thorny problem. Every time an archivist decides to weed the archive their decisions will be incomprehensible to someone. I remember some years ago wandering into a 2nd-hand bookshop in Cromer and bound volumes of Nature that the county library had disposed of crammed into all sorts of corners.
** Ideally one format that can be kept current for a long time - long in an archival sense. PDF/A?
Setting aside the problem of choosing what matters from what exists...
"If Google decided that Groups should go the same way as Wave how much of Usenet would survive?"
Groups is *already* useless as a retrieval engine. DejaNews at least used to be a usable way of finding things.
Groups is also useless as a means of providing links to posts; if you provide a link to Groups, it's probably going to be a dead link in a little while (certainly next time Groups is "improved").
If a newsgroup of interest has an archival site somewhere, you're far better off using webGoogle to search the archival site than you would be using Google Groups. And you're better off providing long term reference links to the archival site rather than to Groups.
If a newsgroup doesn't have an archival site, maybe someone should think about it.
Cas in point: derkeiler's computer-related archives (comp.* stuff).
Well any long term approach will be diverse. While for public data like movies and film, a good approach would be to store them in long term readable DRM-free data. Essentially your Bluray rip with the DRM removed will be playable ad infinitum since all codecs are well defined.
For data you cannot easily distribute that way because they are private, or if you want to consider a "collapse of the civilisation" scenario, you probably go fairly well with microfilm with multiple copies. For documents you'd ideally store both an image of the page and the text it contains in an easy to OCR font.
In any case remember that the simpler the better. Someone might have to build a device or program some software for your files, make it as easy as possible.
Last year we were asked to import some data to a SQL database for one of our local govenrment clients. They sent us the data on a CDRom - the sort that came as a caddy or cartridge, rather than a bare CD.
We had no hardware that was able to accept this media, and the client didn't either.
So we managed to find an old caddy drive on ebay - it was a SCSI interface, so we had to find a compatible SCSI interface card.
The only one we could find was a full length ISA card, and we hadn't got a machine anywhere with a full length ISA slot, so we had to buy an old server (I think it was a Dell PE400 or something) for it to fit into.
We could only find drivers for the SCSI card for Windows NT 3.51, so we had to dig out an old set of floppies (two sets, as it turned out, as some of the floppies were corrupt), and install the O/S.
We needed to be able to transfer the data off the machine, but we couldn't find a network card which would work with NT 3.51, until digging about in the scrap box we found an old 3Com 10Base-T card with both BNC and RJ45 connections.
Getting that to work on our gigabit LAN was umm... interesting... but we finally got everything talking - very slowly...
Then, we found the data on the CDROM was a backup from a Microsoft SQL 7 installation, which wasn't readable by any current version we owned...
We managed to find the install disks, and service packs, which would allow SQL Server to be installed on NT 3.51 (I think it was upped to SP 6 before we could do it), and finally, we were able to open the backup, export the data, and re-import it into our current database.
The whole thing took us two weeks of faffing about, just to read some data from roughly ten years ago.
Sounds like a 3Com 3C509. Unable to do auto neg properly in many cases.
You have to hard strap BOTH ends of the connection otherwise you will have 10Mbs-1 1/2 duplex at one end and 10Mbs-1 full duplex at the other end. That will run really slowly and is probably the cause of your speed problem (apart from being 10Mbs-1 !) Now you might have been able to use a 905 or a 595 - they are PCI though but I'm pretty sure NT3.51 had drivers and they do 100Mbs-1.
The drivers for all of the hardware you mention are in this week's Linux kernel. I don't know what would read the actual data - FreeTDS will access a running MSSQL7 but probably not the backups. At least the box could present the files to a VM.
Today proprietary binary-only file formats are rather rare. We can now store data with millions of points as text files and the overhead of size and processing time is acceptable. Today databases are backed up to text files which, with some limitations, could be restored into any other SQL database system.
Also we no longer store data on disks or optical drives since we ave learned that reading those is a very slow and therefore expensive process.
Just compare the process of copying over a 1 hour video file from one harddisk to another to the elaborate task of playing a Quadruplex video tape (most common type of video tape till the 1980s).
https://www.youtube.com/watch?v=zHDU1wXw1sU
Mr Cerf is an Internet pioneer. Robert E. Kahn and he were inventing the Internet when Mr Berners-Lee had just decided on a college.
Now get off my bandwith!
I recently threw away my university memories.
I bought a server with 1TB for the express purpose of safekeeping all the stuff I had when I left university eight years ago.
The server has sat unused in a corner of my flat for the last three years; I'm not even sure it could still boot. When I moved to a new place last month, I just brought it to the garbage dump because I couldn't bother.
I currently have a Synology for my backups. Hope it works better.
"Webpioneer Vint Cerf has warned – once again – that our digital lives are in danger of being wiped from human history."
Good god, I hope so. Has he _seen_ what comprises "our digital lives"? It's pretty much all cat videos, selfies, and tweets that should have been wiped from history before they were even posted.
History is no longer written by the victor, but by the evil prankster with the glass writer...
Facebook was an actual book compiled by international police containing millions of peoples' mugshots.
The most famous painting, the moaning lisa, was stolen by a gang of cockney criminals driving 3 different coloured Volkswagen Beatles, and later recovered when they broke down.
Etc.
Google, Facebook and more keep data forever, even your family pictures, embarrassing pictures and your e-mails.
If those fail, we still have the NSA, GCHQ and the rest of the "eyes". Getting a copy back from them, as easy as getting Google or Facebook to give a copy back...
The NSA is supposed to be capturing and storing the Internet for their surveillance purposes. We already have a mechanism for capturing and storing the bits - courtesy of the U.S. taxpayers. I welcome future electronic deletion - nature has it's change agents - we are all just dust in the wind.
I was thinking about this very topic a couple of years back.
I’ve been writing computer programs on and off (for fun, and at work too) since my schooldays in the eighties. I must have written ten of thousands of lines of code over the years. I totted up the number of different languages that I’ve coded in (mostly BASIC and variants thereof), and the total came to nine.
Then it dawned on me that, years from now, no one would likely know of my lovingly crafted programs, much less be running them for any reason. All my digital creativity would be lost. I’d have no digital legacy.
Then I had a brainwave. I’d make a piece of artwork out of my computer code. I took screenshots of snippets of nine program listings, in their respective IDEs - one for each of the languages. One was BBC Basic, one was Microsoft QuickBasic, one was a batch file, etc, etc.
I used InDesign to layout the artwork in A4, with the nine screenshots sized proportionately on the page. This left space in the bottom right hand corner so I added in a digitally embellished photo of my childhood self. I finished off the ‘montage’ with a nice ‘circuit board’ themed bezel area.
I printed out my masterpiece and mounted it in a silver frame. It is hanging, with pride of place, in my bathroom opposite my toilet. So now, whenever I’m doing a poo, I can look up at it and relish in my geekiness.
It is my fond hope that, long after I’ve turned to dust, my computer artwork may survive and be pored over endlessly by historians of the age.