back to article Vint Cerf: Everything we do will be ERASED! You can't even find last 2 times I said this

Cyber-pioneer Vint Cerf has warned – once again – that our digital lives are in danger of being wiped from human history. Cerf, who was speaking at the American Association for the Advancement of Science annual meeting, reiterated calls for a "digital vellum" – referring to the ancient parchment made from calf skin and known …

Page:

    1. frank ly

      Re: Rince and Repeat.

      Rincewind? I'm sure Terry Pratchett's books will still be around in 100 years time.

  1. Kubla Cant

    Digital vs. analogue?

    I can see the problems that arise from the volatile nature of digital records. It's much easier to lose or destroy than paper. But on the other hand it's much easier to duplicate, so the net attrition may be the same or less.

    What surprises me is the suggestion that the ability to read old digital data may be lost in the future. Is there any evidence that this happens? If I dig out a 40-year-old CP/M floppy, will it be impossible to read (in the unlikely event that there's anything on it worth reading)? The effort and ingenuity that go into reading historical material like the Dead Sea scrolls and carbonised documents from Herculaneum suggest not.

    It's a bit like the distinction between digital and analogue computers. Reading ancient digital information is presumably a matter of emulating the obsolete technology; a tricky but quantifiable problem. Ancient analogue information presents much greater challenges.

    1. Anonymous Coward
      Anonymous Coward

      Re: Digital vs. analogue?

      Not so much digital vs analogue, but easily duplicated. We're still turning up lost episodes of Dr Who on tape from TV stations around the world.

    2. Anonymous Coward
      Anonymous Coward

      Worry about old media that can't be read is missing the point

      Old CP/M disks may be difficult to read, but now that everything is networked it is easy to transfer the bits to newer storage systems (replicated as necessary) over and over again.

      We don't have to worry about preserving the data, but we do have to worry about preserving the means of interpreting the idea. i.e. if we store JPEGs we need the method of displaying them saved, similar with h.264, HTML5 and so on. That's probably not a difficult problem for common formats like that, for rare formats like saved mail archives from Domino that might be more difficult.

      The overarching problem is preserving some way to determine what is what. We need metadata about each object or collection of objects to tell what it is, where its from, what its purpose was, who originated it, what its significance is, etc. A giant dumping ground of cat videos from 2006-2030 isn't very useful. A way to search it to fit memes like grumpy cat, I can haz cheeseburger cat and whatever at least preserves its cultural context/significance.

      Maybe AI will eventually be able to help there, and be able to 'watch' all the videos, view all the web pages, look at all the pictures, read all the documents and emails, and categorize them in a useful manner. Google's current search ability is nowhere near what we'd need for a future historian to look through this stuff and make some sense out of it.

      1. Anonymous Coward
        Anonymous Coward

        Re: data vs information vs meaning

        "view all the web pages, look at all the pictures, read all the documents and emails, and categorize them in a useful manner."

        Surely that's what Autonomy was sold for?

        And other less well known products used by three and four letter agencies, "for our security".

        1. Michael Wojcik Silver badge

          Re: data vs information vs meaning

          "view all the web pages, look at all the pictures, read all the documents and emails, and categorize them in a useful manner."

          Surely that's what Autonomy was sold for?

          There are quite a few companies operating in this area, many of them for specific markets or industries (legal and medical are the big ones). And there's free software you can use to build your own system, from simple stem-and-index systems like the old MIT Savant software to general frameworks for processing unstructured data like UIMA.

          Even Windows, in its default configuration, will scan and index all the file formats it recognizes. Similar systems exist for Linux and UNIX.

  2. Michael Sanders

    The technology exists

    Maybe I'm missing something. But doesn't an emulator do that? Backward compatibilty is something we've been over doing if you ask me. Rip it off like a bandaid.

    1. Michael Wojcik Silver badge

      Re: The technology exists

      Sometimes the problem is physical media. Sometimes its that the data is in a format that no one recognizes, or that was implemented by some obscure software that's nowhere to be found.

      Emulators solve some problems, but by no means all of them.

      1. JamesTQuirk

        Re: The technology exists

        I searched in WWW.Archive.org - 30 posts/video's/interview by Vint Cerf, including 2 Previous ones ABOUT this topic .... https://archive.org/search.php?query=Vint+Cerf

        But if anyone can't operate a emulator, try the FREE browser version ...

        https://archive.org/details/internetarcade

        1. JamesTQuirk

          Re: The technology exists

          Maybe Not May have been .. Wrong, about both being there, there is 1, Sorry

  3. Anonymous Coward
    Anonymous Coward

    In a small way it's already happened. I worked on a large corporate document project in 1990 and 1991. The documents were written in Microsoft Word 1.0. Graphics were created in MicroGraphx Designer. Some graphics were create in tools (names unremembered) running on DOS. None of this material is usable today. The latest version of MS Word doesn't recognise these old DOC files. There is no support anywhere for MicroGraphx Designer files. The DOS software is long gone. I still have printed copies.

    There's also the point that we may not know what will be interesting or relevant in the future.

    So when people go to the British Library to see the four remaining copies of the Magna Carta, now 800 years old, they might like to wonder whether current "digital only" records, personal or corporate, might be readable by someone in the year 2815!!!! In my experience, it's only taken twenty-five years to render some digital files close to unrecoverable.

    1. fritsd

      ODF.

      FODF (flattened) for the ODF spec document, obviously.

      1. fritsd

        deflate

        Oh yes, and you might want to download rfc1951.txt as well and put it next to it:

        RFC 1951

    2. Charles 9

      "In a small way it's already happened. I worked on a large corporate document project in 1990 and 1991. The documents were written in Microsoft Word 1.0. Graphics were created in MicroGraphx Designer. Some graphics were create in tools (names unremembered) running on DOS. None of this material is usable today. The latest version of MS Word doesn't recognise these old DOC files. There is no support anywhere for MicroGraphx Designer files. The DOS software is long gone. I still have printed copies."

      Are you SURE none of that is useable today? Are you sure you can't fire up a DOS emulator like DOSBox, locate disk images of the software you used (OK, maybe some of it was custom work) or a utility from the time capable of interpreting it? Sure, formats come and go, but there are even now digital preservationists striving to at least keep records of the past available: diskettes imaged and formats described. The hard part is gathering the resources needed to read your old format. After that, you can usually migrate it to a newer format. Plus there are certain formats (like simple text files) that lend themselves better to preservation (as long as the character set is still known, you're OK).

      1. Anonymous Coward
        Anonymous Coward

        "Unusable" in a corporate sense means not worth the cost

        Obviously that stuff CAN be recovered, it just isn't worth the cost. If they contained proof of invention that would mean they win $100 million in a patent case, they'd spare no expense to recover them - even taking "unreadable" diskettes to a data recovery specialist to read off the bits using a STM or whatever.

    3. JamesTQuirk

      Why I think making VM's of old OS is a mission ...

  4. Any mouse Cow turd

    Remember the Domesday Disc anyone???

    This reminds me of the BBC Domesday Disc project. A digital record of life in Britain in the mid 1980's stored on laser disc for posterity and like the Domesday Book, useful to social scientists and historians for hundreds of years. Or as it turned out, about 5 years.

    I know there is a site on the intertubes that allegedly re-creates bits of the disc but I remember playing with the disc in the school library and looking at my home village - not present on the interweb emulator thingy...

    1. Mr. Flibble

      Re: Remember the Domesday Disc anyone???

      Part of the problem is that copyright was not specifically waived, so while conversion could be done, from a legal POV copies are not allowed without contacting the original sources for the information:

      https://en.wikipedia.org/wiki/BBC_Domesday_Project#Preservation

  5. Warm Braw

    Vint says a lot of stuff...

    He said that personal privacy was essentially over because of social networking.

    He also said that he would have liked to make the Internet more secure against interception, but wasn't allowed to as he was, er, working on a secret program on behalf of the NSA.

    And now he wants to record for ever the information he thought should remain private and secure?

    The only thing standing between us and the perpetual violation of our privacy for generations to come is bit rot, so long may it continue.

    There's only any point in archiving stuff that the small number of people who might have the interest and resources to trawl through it in the distant future will have the time to read. So someone might as well sit down and copy the best bits out onto physical vellum - we already have far more historical archive material than will ever be usefully consulted, we don't need to expand it by orders of magnitude every few years.

  6. RobHib
    Thumb Up

    Thank heavens there's still a few influential people around in IT like Cerf and Berners-Lee who have a real grip on the big picture and say what's needed to be said.

    What Cerf says about digital vellum ought to be self evident but it's not to many. Why it's so few is matter of conjecture.

    Unfortunately, whilst Cerf has the power to command attention, I doubt if he will be listened to, as such matters require more than a five-second consideration.

    1. Gordon 10

      I don't think it's particularly self evident - he sounds like chicken little to me.

      The proportion of the Internet worth keeping is tiny. The formats involved number in the low thousands the technologies in the same order of magnitude. All that's really needed is some auto tagging so that photos and personal data can be retained and Facebook posts flushed down the digital crapper.

      1. RobHib
        Thumb Down

        @Gordon 10 - Consider this.

        Let's say some document of equivalent 'importance/influence' as the the King James Bible of 1611 had first entered the world not as printed material but on-line as ephemeral 0s and 1s in 2011--exactly 400 years later.

        1. Would that document have the same influence over the forthcoming 400 years, everything else equated equal?

        2. Would it matter if its influence was more or less influential than its printed-on-atoms version of some 400 years earlier (leaving aside one's religious/political position on its contents)?

        3. What does it mean to society/posterity/history etc. that in the digital age that the 'modern ephemeral version' might never make it past five years of age let alone 400 years?

        Permit me to suggest that these are quite profound questions that will take substantially more time for society to consider than it takes to cross the road whilst reading one's iPhone--the average amount of time users seem to devote to an issue today.

      2. Anonymous Coward
        Anonymous Coward

        Discussions like this one?

        There were some great discussions on the old Aces Hardware forums. Participants from Intel, AMD, and other companies posting about the design of CPUs, Memory, and other key system architecture components. Some insights were posted there that may not have existed anywhere else.

        All lost to a hard drive crash.

  7. Vinyl-Junkie
    Facepalm

    Vellum, eh?

    Can't help feeling he's missed the point about vellum, which is that it was a reusable medium! You simply scraped off the top layer with a sharp knife and reused it. Examinations of surviving vellum documents today indicate that in some cases much more valuable (in today's terms) items have been overwritten with (e.g.) yet another beautifully illuminated Gospel of St John. Aesthetically more pleasing but actually not as interesting as the local history that was written on there until some Abbott decided it wasn't worth keeping...

    Parallels, anyone?

    1. RobHib

      @Vinyl-Junkie - Re: Vellum, eh?

      Let's begin with something that ought to be dear to your heart. Vinyl recordings will most likely outlast their digital counterparts.

      So far, each successive generation of recordings and (recording formats) has a shorter life than its predecessor. I've commercially-cut DVDs that are now unreadable which are less than a decade old, whilst my vinyl records are still very much intact some of which are over 50 years old. I've also a few 78s that are 90+ years old and are still in reasonable condition.

      The issue here is not that vellum can be reused--so can paper with a pencil* and eraser--but whether the data will still be around after a reasonably long time if one wants it to be. Most written stuff from history isn't around today because people either didn't want it to be preserved or they didn't care to look after it.

      The issue Cerf is making is that Vellum, if given a chance, will store information for thousands of years, that's much more than can be said for present digital information storage--the reasons for which are either technical limitations or society's lack of concern for the fate of recently-old information, or both.

      Whilst in theory digital information can be stored indefinitely, practice is another matter altogether. I've considerable difficulty recovering my own digital documents from the early 1980s even though I've taken steps to look after them. Technology has made it difficult for me to store them either efficiently or transparently; whether they warrant storage is another matter altogether--but it's an issue which Cerf and many others including myself are very concerned about.

      (* Wherever possible, I've always used a propelling pencil and eraser in preference to a ballpoint for this specific reason--even today, I don't feel properly dressed unless my 'Cross' propelling pencil and a notebook are in my top pocket. Much experience has shown me that (a) it's still quicker to jot in a notebook than in an iPhone, and (b) that 10-20 years on, that this data stored in human-readable format is more likely to be still about and accessible than its machine-readable cousins.)

  8. Nigel Whitfield.

    As soon as I get home ...

    I shall preserve the email backups from my QIC tapes by printing everything out on the thermal printer and putting it in a big folder

    1. Jan 0 Silver badge

      Re: As soon as I get home ...

      Just make sure the paper is kept in an ULT freezer to stop it all going black. Personally, I'd use a daisy wheel printer.

    2. RobHib

      @Nigel Whitfield -- Re: As soon as I get home ...

      '... thermal printer'

      I hope you mean the kind of thermal printer where the printing dyes/pigments sublimate onto the surface, these are excellent for longevity. Commonplace thermal paper printers are a disaster. That thermal paper has stuff-all retention time before it fades. Moreover, it's sensitive to all sorts of chemicals and turns black in the presence of cleaners, alcohol etc. A mere whiff of certain household chemicals is all that's needed to do damage--just stored nearby bottles of cleaners will do.

      A few years back, I needed to access to some my records for accounting purposes and I had great trouble in reading some documents that were only a couple of years old. They'd faded to the point of illegibility and I had to use the computer entries instead. However, the computer records aren't proof of a transaction whereas the original paper receipt/document is.

      Thermal paper and to a lesser extent badly done dye-line printing are the only forms of data retention that I rate less reliable than current-day digital backups.

      BTW, I've a stack of QICs of the older variety (recorded on Mountain tape drives). Fortunately, I've not needed the data from these in years, for if I did then I'd be hard pressed to retrieve it other than to resurrect some old DOS machine. Does anyone have a simpler, perhaps more eloquent solution to this problem. (I'm sure I'm not alone in having stacks of old QICs in storage.)

      1. JamesTQuirk

        Re: @Nigel Whitfield -- As soon as I get home ...

        About 6 months ago, I given a lend of a 3d printer to evaluate for somebody, one of the "tests" I did, was throwing a scan (after gimping) of my friends wedding certificate and PRINTING it ( I was attending his Anniversary) , it came out as a a flat slab with raised letters/graphics, it may be able to "Gestetner" off a few copies, but took "forever" to print, and turning a cabinet of file into these, would be a institutional effort (madhouse), not to mention space they would take up ...

        PS: Ubuntu has some links with hooking up qic's to a new machine ...

        http://ubuntuforums.org/showthread.php?t=2130423

  9. Doctor Syntax Silver badge

    Long term survival

    Historically the survival of any particular document has been a matter of chance. Some Anglo Saxon charters survive as originals. Some older documents have survived in particularly favourable environments. For the most part, however, texts from antiquity have survived as copies several generations removed from the original and the more copies were made the greater the chance that one or more has survived.

    I don't see that changing in terms of digital texts. Anything posted to Geocities, for example, is long gone unless someone copied it - archive.org doesn't seem to have got it all. If Google decided that Groups should go the same way as Wave how much of Usenet would survive?

    If we are to have digital records available far into the future we need to do three things:

    Have multiple archives of what is to be preserved*

    Each archive needs to copy its material onto new media as old ones become obsolete

    In addition to copying material archives need to translate obsolete file formats into current ones**

    * What is chosen for preservation is a thorny problem. Every time an archivist decides to weed the archive their decisions will be incomprehensible to someone. I remember some years ago wandering into a 2nd-hand bookshop in Cromer and bound volumes of Nature that the county library had disposed of crammed into all sorts of corners.

    ** Ideally one format that can be kept current for a long time - long in an archival sense. PDF/A?

    1. Anonymous Coward
      Anonymous Coward

      Re: Long term survival

      Setting aside the problem of choosing what matters from what exists...

      "If Google decided that Groups should go the same way as Wave how much of Usenet would survive?"

      Groups is *already* useless as a retrieval engine. DejaNews at least used to be a usable way of finding things.

      Groups is also useless as a means of providing links to posts; if you provide a link to Groups, it's probably going to be a dead link in a little while (certainly next time Groups is "improved").

      If a newsgroup of interest has an archival site somewhere, you're far better off using webGoogle to search the archival site than you would be using Google Groups. And you're better off providing long term reference links to the archival site rather than to Groups.

      If a newsgroup doesn't have an archival site, maybe someone should think about it.

      Cas in point: derkeiler's computer-related archives (comp.* stuff).

    2. Christian Berger

      Re: Long term survival

      Well any long term approach will be diverse. While for public data like movies and film, a good approach would be to store them in long term readable DRM-free data. Essentially your Bluray rip with the DRM removed will be playable ad infinitum since all codecs are well defined.

      For data you cannot easily distribute that way because they are private, or if you want to consider a "collapse of the civilisation" scenario, you probably go fairly well with microfilm with multiple copies. For documents you'd ideally store both an image of the page and the text it contains in an easy to OCR font.

      In any case remember that the simpler the better. Someone might have to build a device or program some software for your files, make it as easy as possible.

  10. Alister

    Anecdotal instance

    Last year we were asked to import some data to a SQL database for one of our local govenrment clients. They sent us the data on a CDRom - the sort that came as a caddy or cartridge, rather than a bare CD.

    We had no hardware that was able to accept this media, and the client didn't either.

    So we managed to find an old caddy drive on ebay - it was a SCSI interface, so we had to find a compatible SCSI interface card.

    The only one we could find was a full length ISA card, and we hadn't got a machine anywhere with a full length ISA slot, so we had to buy an old server (I think it was a Dell PE400 or something) for it to fit into.

    We could only find drivers for the SCSI card for Windows NT 3.51, so we had to dig out an old set of floppies (two sets, as it turned out, as some of the floppies were corrupt), and install the O/S.

    We needed to be able to transfer the data off the machine, but we couldn't find a network card which would work with NT 3.51, until digging about in the scrap box we found an old 3Com 10Base-T card with both BNC and RJ45 connections.

    Getting that to work on our gigabit LAN was umm... interesting... but we finally got everything talking - very slowly...

    Then, we found the data on the CDROM was a backup from a Microsoft SQL 7 installation, which wasn't readable by any current version we owned...

    We managed to find the install disks, and service packs, which would allow SQL Server to be installed on NT 3.51 (I think it was upped to SP 6 before we could do it), and finally, we were able to open the backup, export the data, and re-import it into our current database.

    The whole thing took us two weeks of faffing about, just to read some data from roughly ten years ago.

    1. Anonymous Coward
      Anonymous Coward

      Re: Anecdotal instance

      Sounds like a 3Com 3C509. Unable to do auto neg properly in many cases.

      You have to hard strap BOTH ends of the connection otherwise you will have 10Mbs-1 1/2 duplex at one end and 10Mbs-1 full duplex at the other end. That will run really slowly and is probably the cause of your speed problem (apart from being 10Mbs-1 !) Now you might have been able to use a 905 or a 595 - they are PCI though but I'm pretty sure NT3.51 had drivers and they do 100Mbs-1.

      The drivers for all of the hardware you mention are in this week's Linux kernel. I don't know what would read the actual data - FreeTDS will access a running MSSQL7 but probably not the backups. At least the box could present the files to a VM.

    2. Doctor Syntax Silver badge

      Re: Anecdotal instance

      Nice example. You need to migrate the stuff as the previous generation of stuff is becoming obsolete but still on hand.

  11. Christian Berger

    Well luckily we are better off now than in the 1990s

    Today proprietary binary-only file formats are rather rare. We can now store data with millions of points as text files and the overhead of size and processing time is acceptable. Today databases are backed up to text files which, with some limitations, could be restored into any other SQL database system.

    Also we no longer store data on disks or optical drives since we ave learned that reading those is a very slow and therefore expensive process.

    Just compare the process of copying over a 1 hour video file from one harddisk to another to the elaborate task of playing a Quadruplex video tape (most common type of video tape till the 1980s).

    https://www.youtube.com/watch?v=zHDU1wXw1sU

  12. Terry Cloth
    Boffin

    Webpioneer my left eyetooth

    Mr Cerf is an Internet pioneer. Robert E. Kahn and he were inventing the Internet when Mr Berners-Lee had just decided on a college.

    Now get off my bandwith!

  13. ratfox

    Saving stuff on the cloud was made for people like me

    I recently threw away my university memories.

    I bought a server with 1TB for the express purpose of safekeeping all the stuff I had when I left university eight years ago.

    The server has sat unused in a corner of my flat for the last three years; I'm not even sure it could still boot. When I moved to a new place last month, I just brought it to the garbage dump because I couldn't bother.

    I currently have a Synology for my backups. Hope it works better.

  14. Anonymous Coward
    Anonymous Coward

    Insurance

    I have a thousand pictures of me being a complete arse, I save one with every backup, I have found over the years no matter how hard you try you just can't erase those sort of things.

  15. GBE

    "digital lives wiped from history"

    "Webpioneer Vint Cerf has warned – once again – that our digital lives are in danger of being wiped from human history."

    Good god, I hope so. Has he _seen_ what comprises "our digital lives"? It's pretty much all cat videos, selfies, and tweets that should have been wiped from history before they were even posted.

  16. king of foo

    missed opportunity

    History is no longer written by the victor, but by the evil prankster with the glass writer...

    Facebook was an actual book compiled by international police containing millions of peoples' mugshots.

    The most famous painting, the moaning lisa, was stolen by a gang of cockney criminals driving 3 different coloured Volkswagen Beatles, and later recovered when they broke down.

    Etc.

  17. Anonymous Coward
    Trollface

    You've been trolled!

    Who gives a shit?!?!

  18. Wzrd1 Silver badge

    We already have digital vellum

    Google, Facebook and more keep data forever, even your family pictures, embarrassing pictures and your e-mails.

    If those fail, we still have the NSA, GCHQ and the rest of the "eyes". Getting a copy back from them, as easy as getting Google or Facebook to give a copy back...

  19. Harry Anslinger

    That's why we have the NSA

    The NSA is supposed to be capturing and storing the Internet for their surveillance purposes. We already have a mechanism for capturing and storing the bits - courtesy of the U.S. taxpayers. I welcome future electronic deletion - nature has it's change agents - we are all just dust in the wind.

  20. 0765794e08
    Thumb Up

    My Solution

    I was thinking about this very topic a couple of years back.

    I’ve been writing computer programs on and off (for fun, and at work too) since my schooldays in the eighties. I must have written ten of thousands of lines of code over the years. I totted up the number of different languages that I’ve coded in (mostly BASIC and variants thereof), and the total came to nine.

    Then it dawned on me that, years from now, no one would likely know of my lovingly crafted programs, much less be running them for any reason. All my digital creativity would be lost. I’d have no digital legacy.

    Then I had a brainwave. I’d make a piece of artwork out of my computer code. I took screenshots of snippets of nine program listings, in their respective IDEs - one for each of the languages. One was BBC Basic, one was Microsoft QuickBasic, one was a batch file, etc, etc.

    I used InDesign to layout the artwork in A4, with the nine screenshots sized proportionately on the page. This left space in the bottom right hand corner so I added in a digitally embellished photo of my childhood self. I finished off the ‘montage’ with a nice ‘circuit board’ themed bezel area.

    I printed out my masterpiece and mounted it in a silver frame. It is hanging, with pride of place, in my bathroom opposite my toilet. So now, whenever I’m doing a poo, I can look up at it and relish in my geekiness.

    It is my fond hope that, long after I’ve turned to dust, my computer artwork may survive and be pored over endlessly by historians of the age.

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like