back to article Vint Cerf: Everything we do will be ERASED! You can't even find last 2 times I said this

Cyber-pioneer Vint Cerf has warned – once again – that our digital lives are in danger of being wiped from human history. Cerf, who was speaking at the American Association for the Advancement of Science annual meeting, reiterated calls for a "digital vellum" – referring to the ancient parchment made from calf skin and known …

  1. Electron Shepherd
    Unhappy

    Turtles all the way down?

    The project takes snapshots of digital files, including the technical details of the computers for which they were designed.

    And how do we know how to read the details of the snapshots and the computer specifications? No problem, we'll just document it in this digital file here...

    I agree it's a problem, but I'm not sure that storing things digitally is the best solution to the problem that when we store things digitally, we will probably eventually lose track of how to read them back,

    1. Anonymous Coward
      Anonymous Coward

      Re: Turtles all the way down?

      You mean all those videos of my cat that I burned on to 1000 year M-DISC blue ray DVD's were all for naught?

    2. Anonymous Coward
      Anonymous Coward

      Re: Turtles all the way down?

      "And how do we know how to read the details of the snapshots and the computer specifications? No problem, we'll just document it in this digital file here..."

      Digital storage is only a problem if you

      a) don't keep copying the data as you update storage media

      b) don't have the technical know-how to figure out how data are stored on the old media, and retrieve that information.

      What's needed is somewhere you can put Very Important Information that will automatically be backed-up and then transferred from older to newer media as the newer media arrive. Fortunately, in the modern era a lot of this has become "transfer all our old data from the old storage array we thought was large and fast, to the new storage array that's bigger and faster than we thought was possible when we bought the old storage array."

      1. Uncle Slacky Silver badge

        Re: Turtles all the way down?

        Obligatory XKCD:

        https://xkcd.com/1360/

      2. Anonymous Coward
        Anonymous Coward

        Re: Turtles all the way down?

        It's not the media the data is stored on that he's worried about, it's the format it's stored in. When we switch to binary encoding, bits don't have the nice fixed meaning that letters on a page do. Instead we assign meaning to large collections of bits. If we forget how we've done that, then we're unable to recover the meaning from the data.

        This basically means that we not only need to keep all your .doc files on a reliable storage medium, but we also have to keep around a copy of MS Word that can read that version of .doc files and a computer that can run that version of MS Word. Keeping the data eternally is really the easy part of all this.

        1. Jan 0

          Re: Turtles all the way down?

          > "This basically means that we not only need to keep all your .doc files on a reliable storage medium"

          Is there any useful data in .doc files?

        2. Daniel B.
          Boffin

          Re: Turtles all the way down?

          It's not the media the data is stored on that he's worried about, it's the format it's stored in. When we switch to binary encoding, bits don't have the nice fixed meaning that letters on a page do. Instead we assign meaning to large collections of bits. If we forget how we've done that, then we're unable to recover the meaning from the data.

          The fun thing about this is that he's talking about this now, when the issue has been very known in the IT world for quite some time now. Even my dad, who isn't in the IT world already knows about this. Why? Because the following things are no longer readable:

          His probability programs written in college, which are stored in a big-ass magnetic tape roll. We don't even know which format the files are in.

          His PhD thesis, which was written in either Aldus PageMaker 1.0, 2.0 or 3.0 and is stored in a lot of 3.5" floppies. And they're all in HFS Mac format. Extracting that data requires getting at least PageMaker 2.0, 3.0 and 4.0 to get them up to a point where we might extract that data into a Windows PageMaker version, a PPC Mac or a Snow Leopard-toting Intel Mac that would be able to run PageMaker.

          All my Commodore 64 programs and data.

          All the stuff we stored in Jasmine Removable 45 HDDs.

          All the stuff we stored in MDS88 Removable HDDs.

          All the stuff stored in iomega Jaz or ZIP drives. (Fortunately, I had a wee bit of foresight on this, so I managed to rescue most of my ZIP cartridge data before I was no longer able to read 'em. No such luck for my dad.)

          I'm pretty sure that anyone who got into the whole "computers" thingy back in the 80s like me has already lost something to the "digital dark age" by now.

          1. Dave 126 Silver badge

            Obligatory Iain Banks:

            "So," Ash said slowly. "Let me get this straight: you don't know the machine, but it's probably some ancient nameless Apple clone from the dark grey end of the market, almost certainly using reject chips; it probably had a production run that lasted until the first month's rent fell due on the shed the child-labourers were assembling them in, it used an eight-inch drive and ran what sounds like dodgy proprietorial software with more bugs than the Natural History Museum?"

            - The Crow Road, Iain Banks

          2. Charles 9 Silver badge

            Re: Turtles all the way down?

            Actually, nearly 20 years ago I was able to preserve a lot of C64 and C128 data I had by shuttling the data from the C128 to a nearby 486 using modems and a phone cable. Okay, it was slow and tedious at 1200bps using Xmodem, but at least it worked.

            As for the Mac HFS format, I recall there are Windows programs capable of reading them since around 1995.

          3. JamesTQuirk

            Re: Turtles all the way down?

            Noticed this today, So some are trying ..

            5D ‘Superman memory’ crystal could lead to unlimited lifetime data storage

            http://www.southampton.ac.uk/mediacentre/news/2013/jul/13_131.shtml

    3. cyber7
      Alert

      Re: Turtles all the way down?

      This has already been happening, to a lesser extent! I used to work for a company that re-mediated and dumped the data from vintage 9-track and 21-track tape reels, primarily for the oil industry. It's been a frequent occurrence that shooting seismic data in a location is now impossible due to new construction or political motivations, so companies are looking to re-process vintage data with new technologies. We frequently ran into issues with proprietary formats, undocumented fields, and some forms of obfuscation, forcing us to scour archives for format documentation, hoping it wasn't trashed.

      Fortunately, the generations of 30-70 years ago frequently used paper, though I had to decode my share of UNIX 8" floppy disks at times. A few generations from now, our descendants will not have that luxury given our tendency to keep everything digital, the push for cloud storage, and the plethora of disk formats generated over the last century alone.

      1. Anonymous Coward
        Anonymous Coward

        Re: Turtles all the way down?

        "I used to work for a company that re-mediated and dumped the data from vintage 9-track and 21-track tape reels, primarily for the oil industry"

        For a brief moment I feared jake had started using sock puppets. Then I realised you were describing a real world scenario of the kind jake likes to fantasise about. Sorry about that.

        1. Fibbles

          Re: Turtles all the way down?

          It can't be Jake, there's no mention of doing it whilst horseback riding through his vineyards.

          1. Trevor_Pott Gold badge

            Re: Turtles all the way down?

            "It can't be Jake, there's no mention of doing it whilst horseback riding through his vineyards."

            Clearly while on route to his handcrafted mahogany helicopter which he will use for his quarterly trip into the city to gather the few supplies he doesn't make himself artisinally on his massive plot of prime land.

  2. Joseph Eoff

    Not really much of a problem

    99.999999 (repeat 9s for as long as you care) percent of everything on the Internet is of no real relevance to anyone - not even those who post it (including this comment.)

    If YouTube were to go TITSUP permanently what would we lose? Nada. Cats will still be around so we can recreate all the cat videos at will.

    The crap posted to Facebook (and other related sites) won't be of any use in a couple of hundred years - it isn't even of any use NOW.

    Imagine some poor bastard who digs up a bunch of Facebook backups a thousand years from now. All the hard work and dedication needed to decode it (under the assumption is must be important,) and all he gets for it is the photos some assclown posted of his lunch.

    Your life isn't so damned interesting that you need to preserve it for posterity. Get over it (and yourself.)

    1. Roger Greenwood

      Re: Not really much of a problem

      Au contraire - social hostorians etc will love the treasure trove - e.g. what drove the fascination with some kid called Bieber for a year or two at the beginning of the 21st century? (Thankfully now forgotten). Besides their research grants will depend on it.

      1. Joseph Eoff

        Re: Not really much of a problem

        Actually, I suspect any social scientist who investigated the "Bieber incident" would lose all faith in himself and humanity and committ suicide - possibly together with the technicians who put so much effort into recovering the records only to find the "The Bieber" and the previously mentioned lunch photos.

        1. Zot

          Re: Not really much of a problem

          The Bieber story will be told and re-told with more and more embellishment over the next thousand years, until all the rhetoric will be written down freshly from snippets of teenage writings and Lo, he'll appear to be a Jesus like figure - it's what happens, you know.

      2. Jack of Shadows Silver badge

        Re: Not really much of a problem

        Given that middenheaps (trash dumps) is the source for much of what we know about past cultures, it's singularly appropriate that Facebook and related ilk likely will be used for the same.

      3. Mark 85 Silver badge

        Re: Not really much of a problem

        What you say is true. The biggest treasure trove for archaeologists is usually trash dumps. Other than stone, pottery, and some tools, much doesn't survive time. With the digital age, things are even more transitory than with paper.

        Nothing really is forever. Time moves on. Things disintegrate. If we could travel a thousand years into the future, it would be interesting to see what survived and what didn't. And how what was found was interpreted. I'm sure that many things will be glossed over as "being of religious or ceremonial use".

        There's a story I remember from decades ago about some archaeologists excavating a motel in the future and how they interpreted things. Hilarious, yet not far off the mark, IMO.

        1. DaiKiwi

          Re: Not really much of a problem

          The Toot 'n' Come In Motel, or some such. I remember reading it in a Readers Digest 25+ years ago. I must look it up...

          Ok, I went & looked it up - It was an extract from David Macaulay's 'Motel of the Mysteries' (1979). Macaulay's book begins by noting that America was destroyed in 1985 when it was suddenly covered by a huge flood of junk mail, unleashed by an accidental reduction in postal rates, followed by the sudden fall of solid pollutants from the atmosphere, placing another layer over the buried country. Found it mentioned in a Locus magazine article about the Archaeology of the Future. Now to find the book - and some of the other stories mentioned in the article.

    2. Naselus

      Re: Not really much of a problem

      While stuff like Facebook isn't worth saving, there's still serious work which is worth keeping - some academic journals are now online only. And who knows, the digital archaeologists of 2250 might actually care about those photos of your lunch. The Roman man-in-the-street's view is considerably more valuable than the written histories we have simply because no-one at the time thought the pleb's lives were interesting or worth recording.

      1. Joseph Eoff

        Re: Not really much of a problem

        In the main, the "plebs'" lives aren't very interesting - not even of today's "plebs."

        How many life stories do you have to reconstruct to realize that people then are pretty much people now. They work, they eat, they sleep, they do stupid shit, they die. You do it, I do it, the rich do it, the famous do it, the brilliant do it, the stupid do it.

        Certainly there are things that need to be preserved - the back ground of historical events and agreements, records of scientific progess, and a zillion other things.

        We don't need perfectly preserved records of the lives of billions of people.

        1. Electron Shepherd

          Re: Not really much of a problem

          We don't need perfectly preserved records of the lives of billions of people.

          We as a society don't, but individual people do. How many people these days with a new-born baby will realise in 25 years time that they have no ability to look at any of their photos of their children growing up?

          1. Joseph Eoff

            Re: Not really much of a problem

            That is purely the stupidity of trusting your personal things to a corporation.

            Keep your pictures locally, and print the important ones - or do you really think you need every single picture you took of your kids spitting up?

          2. Anonymous Coward
            Anonymous Coward

            Re: Not really much of a problem

            And how many of them then think "You were a child, just like every human ever, so who cares, let's stream a film no-one will care about in 2 years time."

          3. Anonymous Coward
            Anonymous Coward

            Re: Not really much of a problem

            That's why pictures and videos of my kid are stored on three different hard drives, replaced/upgraded every 4 or 5 years, and backed up to MDisc BluRay media, with an unused BluRay drive kept in storage until the next major interface upgrade (from SATA to ...???)

            1. Dave 126 Silver badge

              Re: Not really much of a problem

              >That's why pictures and videos of my kid are stored on three different hard drives, replaced/upgraded every 4 or 5 years,

              That's a good start. There are still some issues that might affect you, especially if your images are in a compressed format such as JPG. A single bit error can be enough to trash a compressed image. True, if you spot an issue yourself, you acn of course manually recoverthe image from the back-ups - but this can't be done automatically if the file system doesn't know that the file has been damaged. This is an issue that ZFS, amongst other file systems, addresses.

              http://openpreservation.org/system/files/Bit%20Rot_OPF_0.pdf

              http://arstechnica.com/information-technology/2014/01/bitrot-and-atomic-cows-inside-next-gen-filesystems/

        2. Doctor Syntax Silver badge

          Re: Not really much of a problem

          'In the main, the "plebs'" lives aren't very interesting - not even of today's "plebs."...Certainly there are things that need to be preserved - the back ground of historical events and agreements'

          Maybe not from your point of view. However as you're not the only person in this planet that's a fairly limited one. Social historians find these lives much more interesting than the doings of the political classes.

          To take one tiny area - how did the domestic textile industry of the West Riding evolve? Just how did the clothiers operate? How did this differ from the domestic industries of other areas such as the Cotswolds? How did it differ from the development of the metal-working industry of the Sheffield area? Manorial and parish records are remarkably unforthcoming about this aspect of their inhabitants. These are not trivial historical concerns; these trades launched parts of the industrial revolution and yet our understanding of them is quite limited.

        3. Michael Wojcik Silver badge

          Re: Not really much of a problem

          How many life stories do you have to reconstruct to realize that people then are pretty much people now.

          Ah. This sort of story never fails to bring out the sophomores in droves.

          "Digital Humanities" research is already turning up all sorts of important historical information using corpora of documents from the past couple millennia, which have been scanned, OCR'd, and corrected.

          An example I may have mentioned here before: I saw a presentation some years ago at MLA on identifying the time period of the modal shift in imitatio christi - when people went from saying "what did Jesus do?" as a moral touchstone to "what would Jesus do?". The presenter had done searches through a number of massive corpora of late-medieval and early-modern documents for variations of those phrases in Latin and vernaculars, and found good evidence that the shift occurred relatively rapidly in, if memory serves, the late sixteenth century.

          So what? Well, in the Christian-dominated European cultures of the era, the imitatio christi modal shift - which happened among the "organic intellectual" members of the working class, so yes, very much the plebs - is a hallmark of modernity. It happens when Christians no longer see the present day as equivalent to the historical moment of Jesus' life, but instead a new and different milieu. And that's just what "modern" means.

          That's a change in the historical episteme that we couldn't track without computer analysis of huge text corpora. The vast majority of content produced online every day may be dross as individual pieces, but in aggregate it can tell us a lot about ourselves.

          And anyone who looks at the actual research people are conducting with it now, instead of indulging in idle armchair speculation, would know that.

          1. JamesTQuirk

            Re: Not really much of a problem

            Armchair speculation ? Isn't that what Christians live on ? NO Proof, Just Believe, in a story made to create a religion that was the largest "business" this Planet has ever seen, until the population become educated, and realised what a pile of crap it is. How it's been perverted into a Control Mechanism for Governments & Arseholes who think "they" know all, cause, they say, a "voice in their head" said it was OK to rort people ...

            You can't quote fairy Tales as a basis for truth....

            1. Michael Wojcik Silver badge

              Re: Not really much of a problem

              You can't quote fairy Tales as a basis for truth....

              And no one in this thread has done so.

              If you have some free time, could I suggest you learn to read? And then maybe you'd like to study how to write.

      2. Robert Helpmann?? Silver badge
        Childcatcher

        Re: Not really much of a problem

        While stuff like Facebook isn't worth saving...

        Hold on just a moment there! Archaeologists have been digging through modern landfills for more than 40 years. One man's garbage is another man's research paper.

      3. John Sanders
        Holmes

        Re: Not really much of a problem

        If only someone at the time of the Romans had made a blog about Roman concrete mixing...

    3. gerryg

      Re: Not really much of a problem

      If YouTube were replaced then I agree with you, however loss without replacement would negatively me, albeit a bit niche, old videos of Jimi Hendrix concerts, interviews with him, Mitch Mitchell etc., on the one hand, Go turorials on the other, then there are all those "how to replace that widget on your gadget".

      Boring to most I'm sure, but they might have their own examples,

    4. bjr

      Re: Not really much of a problem

      One century's garbage is another century's treasure. That's literally true, the thing that archaeologist's love most of all is garbage dumps and cesspits. Historians in the distant future will find Facebook fascinating. They will also love our primitive cat videos. Undoubtedly they will have some form of cat video which is as unimaginable to us as YouTube would have been to the Egyptians who built The Sphinx (the oldest known cat video).

    5. John Sanders
      Holmes

      Re: Not really much of a problem

      In the future, (if political correctness hasn't busted civilization yet) historians will find your post amusing.

    6. Anonymous Coward
      Anonymous Coward

      Re: Not really much of a problem

      I worked on US Census data in the 1980s, stored on magnetic tape, with an access program called CENSPAC, a derivative of COBOL. I suspect much of that demographic data (and back then, the Census asked a lot) is unreadable now.

    7. JamesTQuirk

      Re: Not really much of a problem

      or try to find it again @ ... Internet Archive: Digital Library of Free Books, Movies ...

      https://archive.org/index.php

      or if Cat vid's are your drug of choice, maybe ....

      WayBackMachine - 452 billion web pages saved over time

      https://archive.org/web/web.php

      I like here ....

      https://archive.org/details/software

      Ps: I used 8" - 51/4" 360k/1.2m - 3" - 3.5"720k / 1.44m, HDD in 5mg,20mg40mg & UP, I have Files and HDD's from Amiga's, Old Mac's, PC, Atari ST (lookin for VM), even my Commodore 64 files are on HDD ....

      My point is the Data will move, just some buggers got to do it ....

  3. Captain Hogwash Silver badge

    I don't see why he's so concerned.

    GCHQ/NSA will take care of it.

  4. Anonymous Coward
    Anonymous Coward

    GCHQ and NSA alternatives

    The GCHQ and NSA archives aren't open to the public.

    On the other hand, archive.org (the wayback machine) is accesible to the public for free (at the moment) and access just needs a browser and not much else. How hard can that be.

    Anything more obscure than that and there are a few computer museums, the classic computers mailing list, and so on.

    If Cerf had said we need to preserve *knowledge*, the kind of stuff that used to be in the non-fiction section in buildings called libraries, then he'd have had an excellent point.

  5. RyokuMas Silver badge
    Joke

    Family tree...

    Is this so in a few decades time Google can fling ads at people based on their genealogy?

    I can just see it now - some little tyke with his brand new neural plugs chips into the Googletrix only be confronted with "popular with your forefathers..."

  6. Androgynous Cupboard Silver badge

    That's what PDF/A is designed for

    If it's digital you're going to need hardware and software to decode it of course but once you get past that "bootstrap" problem PDF/A is designed to be completely self-contained.

  7. Peter Gathercole Silver badge

    It's hopeless

    We need a technology that can be abandoned and still be readable in future times.

    Any technological solution is bound to fail because maintaining it requires repeated investment in either maintaining what will become an obsolete storage format in the future, or repeatedly re-writing it as new media are invented.

    It's all very well suggesting that technology from people such as "Carnegie Mellon University and IBM Research" might be worth using, but this assumes a certain amount of continuity to maintain the physical storage that requires organisations to survive. You cannot rely on government or industry to still be around in the future, and the 'Cloud' (whatever is meant by that) needs to be maintained as well.

    You end up with stupid chicken-and-egg situations if the description of the programs and machines necessary to read the media is only stored on the media itself.

    I respect Vint Cerf. He's very influential. But he's not, in the grand scheme of things, an engineer (his degrees are in Mathematics, and he's managed various teams and companies mainly on data communication). Nowadays, he's good at the grand scheme thinking, not the detail.

    He was being interviewed on Radio 4 this morning, and I got the feeling that he was either dumbing down what he was saying for a non-technical audience, or that he did not fully understand various fundamentals on machine architecture and what would be necessary to maintain in order to run a program from a current generation of machines. I would hope that it was the former, but I was not convinced. When taking about the systems, he talked about taking a snapshot of the software "with a description of the machine it runs on", glossing over that the description would have to be incredibly detailed to capture all the nuances of machine architecture to allow a working machine to be reconstructed from that description.

    I would suspect strongly that it would already be nigh on impossible already to reconstruct systems from people like DG, Prime or Tandem (amongst others) unless working physical instances exist.

    Trying to capture all of the operating characteristics of a complex modern processor like Power 8 or a Haswell and the associated support chipsets to allow it to be reimplemented in the future on architectures unimaginable at the moment would be a herculean task!

    Much better would be to ban the use of all proprietary closed file formats, and keep the definition of the open file formats in enough detail to reconstruct the data stored in those formats.

    But this does not alter the fact that there needs to be readable media maintained in perpetuity.

    1. a well wisher

      Re: It's hopeless

      And didn't he also say that they keep all their info on archival paper EVEN though its all been scanned in

  8. Pen-y-gors Silver badge

    Easy!

    Very large slab of granite and a chisel, and just chip out the bits.

    Not sure about the cat videos though...

    1. fritsd

      zoopraxiscope for cat videos

      Better glue those slabs of granite on tight.. wouldn't want one to fly off and drop on someone's toe.

      http://en.wikipedia.org/wiki/Zoopraxiscope

  9. AbortRetryFail

    Not such a new concept

    People have been saying this for decades. I remember a New Scientist article in the late 1980's postulating that we were entering a new "Dark Ages" for the reason that hundreds of years from now historians would be unable to read most of our digital records.

    Also, Iain Banks' book "The Crow Road" centres on trying to read a diskette from an obsolete computer, and it was published in 1992.

    1. Michael Wojcik Silver badge

      Re: Not such a new concept

      Yeah, Cerf's way behind on this one. The ELO's Acid-Free Bits recommendations predate his "vellum" proposals by ten years, and as you note people have been complaining about losing data due to format and encoding obsolescence since at least the 1980s, and probably before. And as other posters have noted, there's an entire industry around recovering this stuff.

      Doesn't make his point less important, of course, but it's certainly not novel.

  10. Bob Wheeler
    Boffin

    Rince and Repeat.

    What he's talking about is the 100 year archive.

    How to keep access to data - be that knowledge or cat video's - across the generations.

    http://www.theregister.co.uk/2011/06/04/snia_100_year_archive/

    1. frank ly Silver badge

      Re: Rince and Repeat.

      Rincewind? I'm sure Terry Pratchett's books will still be around in 100 years time.

  11. Kubla Cant Silver badge

    Digital vs. analogue?

    I can see the problems that arise from the volatile nature of digital records. It's much easier to lose or destroy than paper. But on the other hand it's much easier to duplicate, so the net attrition may be the same or less.

    What surprises me is the suggestion that the ability to read old digital data may be lost in the future. Is there any evidence that this happens? If I dig out a 40-year-old CP/M floppy, will it be impossible to read (in the unlikely event that there's anything on it worth reading)? The effort and ingenuity that go into reading historical material like the Dead Sea scrolls and carbonised documents from Herculaneum suggest not.

    It's a bit like the distinction between digital and analogue computers. Reading ancient digital information is presumably a matter of emulating the obsolete technology; a tricky but quantifiable problem. Ancient analogue information presents much greater challenges.

    1. Anonymous Coward
      Anonymous Coward

      Re: Digital vs. analogue?

      Not so much digital vs analogue, but easily duplicated. We're still turning up lost episodes of Dr Who on tape from TV stations around the world.

    2. DougS Silver badge

      Worry about old media that can't be read is missing the point

      Old CP/M disks may be difficult to read, but now that everything is networked it is easy to transfer the bits to newer storage systems (replicated as necessary) over and over again.

      We don't have to worry about preserving the data, but we do have to worry about preserving the means of interpreting the idea. i.e. if we store JPEGs we need the method of displaying them saved, similar with h.264, HTML5 and so on. That's probably not a difficult problem for common formats like that, for rare formats like saved mail archives from Domino that might be more difficult.

      The overarching problem is preserving some way to determine what is what. We need metadata about each object or collection of objects to tell what it is, where its from, what its purpose was, who originated it, what its significance is, etc. A giant dumping ground of cat videos from 2006-2030 isn't very useful. A way to search it to fit memes like grumpy cat, I can haz cheeseburger cat and whatever at least preserves its cultural context/significance.

      Maybe AI will eventually be able to help there, and be able to 'watch' all the videos, view all the web pages, look at all the pictures, read all the documents and emails, and categorize them in a useful manner. Google's current search ability is nowhere near what we'd need for a future historian to look through this stuff and make some sense out of it.

      1. Anonymous Coward
        Anonymous Coward

        Re: data vs information vs meaning

        "view all the web pages, look at all the pictures, read all the documents and emails, and categorize them in a useful manner."

        Surely that's what Autonomy was sold for?

        And other less well known products used by three and four letter agencies, "for our security".

        1. Michael Wojcik Silver badge

          Re: data vs information vs meaning

          "view all the web pages, look at all the pictures, read all the documents and emails, and categorize them in a useful manner."

          Surely that's what Autonomy was sold for?

          There are quite a few companies operating in this area, many of them for specific markets or industries (legal and medical are the big ones). And there's free software you can use to build your own system, from simple stem-and-index systems like the old MIT Savant software to general frameworks for processing unstructured data like UIMA.

          Even Windows, in its default configuration, will scan and index all the file formats it recognizes. Similar systems exist for Linux and UNIX.

  12. Michael Sanders

    The technology exists

    Maybe I'm missing something. But doesn't an emulator do that? Backward compatibilty is something we've been over doing if you ask me. Rip it off like a bandaid.

    1. Michael Wojcik Silver badge

      Re: The technology exists

      Sometimes the problem is physical media. Sometimes its that the data is in a format that no one recognizes, or that was implemented by some obscure software that's nowhere to be found.

      Emulators solve some problems, but by no means all of them.

      1. JamesTQuirk

        Re: The technology exists

        I searched in WWW.Archive.org - 30 posts/video's/interview by Vint Cerf, including 2 Previous ones ABOUT this topic .... https://archive.org/search.php?query=Vint+Cerf

        But if anyone can't operate a emulator, try the FREE browser version ...

        https://archive.org/details/internetarcade

        1. JamesTQuirk

          Re: The technology exists

          Maybe Not May have been .. Wrong, about both being there, there is 1, Sorry

  13. Anonymous Coward
    Anonymous Coward

    In a small way it's already happened. I worked on a large corporate document project in 1990 and 1991. The documents were written in Microsoft Word 1.0. Graphics were created in MicroGraphx Designer. Some graphics were create in tools (names unremembered) running on DOS. None of this material is usable today. The latest version of MS Word doesn't recognise these old DOC files. There is no support anywhere for MicroGraphx Designer files. The DOS software is long gone. I still have printed copies.

    There's also the point that we may not know what will be interesting or relevant in the future.

    So when people go to the British Library to see the four remaining copies of the Magna Carta, now 800 years old, they might like to wonder whether current "digital only" records, personal or corporate, might be readable by someone in the year 2815!!!! In my experience, it's only taken twenty-five years to render some digital files close to unrecoverable.

    1. fritsd

      ODF.

      FODF (flattened) for the ODF spec document, obviously.

      1. fritsd

        deflate

        Oh yes, and you might want to download rfc1951.txt as well and put it next to it:

        RFC 1951

    2. Charles 9 Silver badge

      "In a small way it's already happened. I worked on a large corporate document project in 1990 and 1991. The documents were written in Microsoft Word 1.0. Graphics were created in MicroGraphx Designer. Some graphics were create in tools (names unremembered) running on DOS. None of this material is usable today. The latest version of MS Word doesn't recognise these old DOC files. There is no support anywhere for MicroGraphx Designer files. The DOS software is long gone. I still have printed copies."

      Are you SURE none of that is useable today? Are you sure you can't fire up a DOS emulator like DOSBox, locate disk images of the software you used (OK, maybe some of it was custom work) or a utility from the time capable of interpreting it? Sure, formats come and go, but there are even now digital preservationists striving to at least keep records of the past available: diskettes imaged and formats described. The hard part is gathering the resources needed to read your old format. After that, you can usually migrate it to a newer format. Plus there are certain formats (like simple text files) that lend themselves better to preservation (as long as the character set is still known, you're OK).

      1. DougS Silver badge

        "Unusable" in a corporate sense means not worth the cost

        Obviously that stuff CAN be recovered, it just isn't worth the cost. If they contained proof of invention that would mean they win $100 million in a patent case, they'd spare no expense to recover them - even taking "unreadable" diskettes to a data recovery specialist to read off the bits using a STM or whatever.

    3. JamesTQuirk

      Why I think making VM's of old OS is a mission ...

  14. Any mouse Cow turd

    Remember the Domesday Disc anyone???

    This reminds me of the BBC Domesday Disc project. A digital record of life in Britain in the mid 1980's stored on laser disc for posterity and like the Domesday Book, useful to social scientists and historians for hundreds of years. Or as it turned out, about 5 years.

    I know there is a site on the intertubes that allegedly re-creates bits of the disc but I remember playing with the disc in the school library and looking at my home village - not present on the interweb emulator thingy...

    1. Mr. Flibble

      Re: Remember the Domesday Disc anyone???

      Part of the problem is that copyright was not specifically waived, so while conversion could be done, from a legal POV copies are not allowed without contacting the original sources for the information:

      https://en.wikipedia.org/wiki/BBC_Domesday_Project#Preservation

  15. Warm Braw Silver badge

    Vint says a lot of stuff...

    He said that personal privacy was essentially over because of social networking.

    He also said that he would have liked to make the Internet more secure against interception, but wasn't allowed to as he was, er, working on a secret program on behalf of the NSA.

    And now he wants to record for ever the information he thought should remain private and secure?

    The only thing standing between us and the perpetual violation of our privacy for generations to come is bit rot, so long may it continue.

    There's only any point in archiving stuff that the small number of people who might have the interest and resources to trawl through it in the distant future will have the time to read. So someone might as well sit down and copy the best bits out onto physical vellum - we already have far more historical archive material than will ever be usefully consulted, we don't need to expand it by orders of magnitude every few years.

  16. RobHib
    Thumb Up

    Thank heavens there's still a few influential people around in IT like Cerf and Berners-Lee who have a real grip on the big picture and say what's needed to be said.

    What Cerf says about digital vellum ought to be self evident but it's not to many. Why it's so few is matter of conjecture.

    Unfortunately, whilst Cerf has the power to command attention, I doubt if he will be listened to, as such matters require more than a five-second consideration.

    1. Gordon 10 Silver badge

      I don't think it's particularly self evident - he sounds like chicken little to me.

      The proportion of the Internet worth keeping is tiny. The formats involved number in the low thousands the technologies in the same order of magnitude. All that's really needed is some auto tagging so that photos and personal data can be retained and Facebook posts flushed down the digital crapper.

      1. RobHib
        Thumb Down

        @Gordon 10 - Consider this.

        Let's say some document of equivalent 'importance/influence' as the the King James Bible of 1611 had first entered the world not as printed material but on-line as ephemeral 0s and 1s in 2011--exactly 400 years later.

        1. Would that document have the same influence over the forthcoming 400 years, everything else equated equal?

        2. Would it matter if its influence was more or less influential than its printed-on-atoms version of some 400 years earlier (leaving aside one's religious/political position on its contents)?

        3. What does it mean to society/posterity/history etc. that in the digital age that the 'modern ephemeral version' might never make it past five years of age let alone 400 years?

        Permit me to suggest that these are quite profound questions that will take substantially more time for society to consider than it takes to cross the road whilst reading one's iPhone--the average amount of time users seem to devote to an issue today.

      2. Anonymous Coward
        Anonymous Coward

        Discussions like this one?

        There were some great discussions on the old Aces Hardware forums. Participants from Intel, AMD, and other companies posting about the design of CPUs, Memory, and other key system architecture components. Some insights were posted there that may not have existed anywhere else.

        All lost to a hard drive crash.

  17. Vinyl-Junkie
    Facepalm

    Vellum, eh?

    Can't help feeling he's missed the point about vellum, which is that it was a reusable medium! You simply scraped off the top layer with a sharp knife and reused it. Examinations of surviving vellum documents today indicate that in some cases much more valuable (in today's terms) items have been overwritten with (e.g.) yet another beautifully illuminated Gospel of St John. Aesthetically more pleasing but actually not as interesting as the local history that was written on there until some Abbott decided it wasn't worth keeping...

    Parallels, anyone?

    1. RobHib

      @Vinyl-Junkie - Re: Vellum, eh?

      Let's begin with something that ought to be dear to your heart. Vinyl recordings will most likely outlast their digital counterparts.

      So far, each successive generation of recordings and (recording formats) has a shorter life than its predecessor. I've commercially-cut DVDs that are now unreadable which are less than a decade old, whilst my vinyl records are still very much intact some of which are over 50 years old. I've also a few 78s that are 90+ years old and are still in reasonable condition.

      The issue here is not that vellum can be reused--so can paper with a pencil* and eraser--but whether the data will still be around after a reasonably long time if one wants it to be. Most written stuff from history isn't around today because people either didn't want it to be preserved or they didn't care to look after it.

      The issue Cerf is making is that Vellum, if given a chance, will store information for thousands of years, that's much more than can be said for present digital information storage--the reasons for which are either technical limitations or society's lack of concern for the fate of recently-old information, or both.

      Whilst in theory digital information can be stored indefinitely, practice is another matter altogether. I've considerable difficulty recovering my own digital documents from the early 1980s even though I've taken steps to look after them. Technology has made it difficult for me to store them either efficiently or transparently; whether they warrant storage is another matter altogether--but it's an issue which Cerf and many others including myself are very concerned about.

      (* Wherever possible, I've always used a propelling pencil and eraser in preference to a ballpoint for this specific reason--even today, I don't feel properly dressed unless my 'Cross' propelling pencil and a notebook are in my top pocket. Much experience has shown me that (a) it's still quicker to jot in a notebook than in an iPhone, and (b) that 10-20 years on, that this data stored in human-readable format is more likely to be still about and accessible than its machine-readable cousins.)

  18. Nigel Whitfield.

    As soon as I get home ...

    I shall preserve the email backups from my QIC tapes by printing everything out on the thermal printer and putting it in a big folder

    1. Jan 0

      Re: As soon as I get home ...

      Just make sure the paper is kept in an ULT freezer to stop it all going black. Personally, I'd use a daisy wheel printer.

    2. RobHib

      @Nigel Whitfield -- Re: As soon as I get home ...

      '... thermal printer'

      I hope you mean the kind of thermal printer where the printing dyes/pigments sublimate onto the surface, these are excellent for longevity. Commonplace thermal paper printers are a disaster. That thermal paper has stuff-all retention time before it fades. Moreover, it's sensitive to all sorts of chemicals and turns black in the presence of cleaners, alcohol etc. A mere whiff of certain household chemicals is all that's needed to do damage--just stored nearby bottles of cleaners will do.

      A few years back, I needed to access to some my records for accounting purposes and I had great trouble in reading some documents that were only a couple of years old. They'd faded to the point of illegibility and I had to use the computer entries instead. However, the computer records aren't proof of a transaction whereas the original paper receipt/document is.

      Thermal paper and to a lesser extent badly done dye-line printing are the only forms of data retention that I rate less reliable than current-day digital backups.

      BTW, I've a stack of QICs of the older variety (recorded on Mountain tape drives). Fortunately, I've not needed the data from these in years, for if I did then I'd be hard pressed to retrieve it other than to resurrect some old DOS machine. Does anyone have a simpler, perhaps more eloquent solution to this problem. (I'm sure I'm not alone in having stacks of old QICs in storage.)

      1. JamesTQuirk

        Re: @Nigel Whitfield -- As soon as I get home ...

        About 6 months ago, I given a lend of a 3d printer to evaluate for somebody, one of the "tests" I did, was throwing a scan (after gimping) of my friends wedding certificate and PRINTING it ( I was attending his Anniversary) , it came out as a a flat slab with raised letters/graphics, it may be able to "Gestetner" off a few copies, but took "forever" to print, and turning a cabinet of file into these, would be a institutional effort (madhouse), not to mention space they would take up ...

        PS: Ubuntu has some links with hooking up qic's to a new machine ...

        http://ubuntuforums.org/showthread.php?t=2130423

  19. Doctor Syntax Silver badge

    Long term survival

    Historically the survival of any particular document has been a matter of chance. Some Anglo Saxon charters survive as originals. Some older documents have survived in particularly favourable environments. For the most part, however, texts from antiquity have survived as copies several generations removed from the original and the more copies were made the greater the chance that one or more has survived.

    I don't see that changing in terms of digital texts. Anything posted to Geocities, for example, is long gone unless someone copied it - archive.org doesn't seem to have got it all. If Google decided that Groups should go the same way as Wave how much of Usenet would survive?

    If we are to have digital records available far into the future we need to do three things:

    Have multiple archives of what is to be preserved*

    Each archive needs to copy its material onto new media as old ones become obsolete

    In addition to copying material archives need to translate obsolete file formats into current ones**

    * What is chosen for preservation is a thorny problem. Every time an archivist decides to weed the archive their decisions will be incomprehensible to someone. I remember some years ago wandering into a 2nd-hand bookshop in Cromer and bound volumes of Nature that the county library had disposed of crammed into all sorts of corners.

    ** Ideally one format that can be kept current for a long time - long in an archival sense. PDF/A?

    1. Anonymous Coward
      Anonymous Coward

      Re: Long term survival

      Setting aside the problem of choosing what matters from what exists...

      "If Google decided that Groups should go the same way as Wave how much of Usenet would survive?"

      Groups is *already* useless as a retrieval engine. DejaNews at least used to be a usable way of finding things.

      Groups is also useless as a means of providing links to posts; if you provide a link to Groups, it's probably going to be a dead link in a little while (certainly next time Groups is "improved").

      If a newsgroup of interest has an archival site somewhere, you're far better off using webGoogle to search the archival site than you would be using Google Groups. And you're better off providing long term reference links to the archival site rather than to Groups.

      If a newsgroup doesn't have an archival site, maybe someone should think about it.

      Cas in point: derkeiler's computer-related archives (comp.* stuff).

    2. Christian Berger Silver badge

      Re: Long term survival

      Well any long term approach will be diverse. While for public data like movies and film, a good approach would be to store them in long term readable DRM-free data. Essentially your Bluray rip with the DRM removed will be playable ad infinitum since all codecs are well defined.

      For data you cannot easily distribute that way because they are private, or if you want to consider a "collapse of the civilisation" scenario, you probably go fairly well with microfilm with multiple copies. For documents you'd ideally store both an image of the page and the text it contains in an easy to OCR font.

      In any case remember that the simpler the better. Someone might have to build a device or program some software for your files, make it as easy as possible.

  20. Alister Silver badge

    Anecdotal instance

    Last year we were asked to import some data to a SQL database for one of our local govenrment clients. They sent us the data on a CDRom - the sort that came as a caddy or cartridge, rather than a bare CD.

    We had no hardware that was able to accept this media, and the client didn't either.

    So we managed to find an old caddy drive on ebay - it was a SCSI interface, so we had to find a compatible SCSI interface card.

    The only one we could find was a full length ISA card, and we hadn't got a machine anywhere with a full length ISA slot, so we had to buy an old server (I think it was a Dell PE400 or something) for it to fit into.

    We could only find drivers for the SCSI card for Windows NT 3.51, so we had to dig out an old set of floppies (two sets, as it turned out, as some of the floppies were corrupt), and install the O/S.

    We needed to be able to transfer the data off the machine, but we couldn't find a network card which would work with NT 3.51, until digging about in the scrap box we found an old 3Com 10Base-T card with both BNC and RJ45 connections.

    Getting that to work on our gigabit LAN was umm... interesting... but we finally got everything talking - very slowly...

    Then, we found the data on the CDROM was a backup from a Microsoft SQL 7 installation, which wasn't readable by any current version we owned...

    We managed to find the install disks, and service packs, which would allow SQL Server to be installed on NT 3.51 (I think it was upped to SP 6 before we could do it), and finally, we were able to open the backup, export the data, and re-import it into our current database.

    The whole thing took us two weeks of faffing about, just to read some data from roughly ten years ago.

    1. gerdesj Silver badge

      Re: Anecdotal instance

      Sounds like a 3Com 3C509. Unable to do auto neg properly in many cases.

      You have to hard strap BOTH ends of the connection otherwise you will have 10Mbs-1 1/2 duplex at one end and 10Mbs-1 full duplex at the other end. That will run really slowly and is probably the cause of your speed problem (apart from being 10Mbs-1 !) Now you might have been able to use a 905 or a 595 - they are PCI though but I'm pretty sure NT3.51 had drivers and they do 100Mbs-1.

      The drivers for all of the hardware you mention are in this week's Linux kernel. I don't know what would read the actual data - FreeTDS will access a running MSSQL7 but probably not the backups. At least the box could present the files to a VM.

    2. Doctor Syntax Silver badge

      Re: Anecdotal instance

      Nice example. You need to migrate the stuff as the previous generation of stuff is becoming obsolete but still on hand.

  21. Christian Berger Silver badge

    Well luckily we are better off now than in the 1990s

    Today proprietary binary-only file formats are rather rare. We can now store data with millions of points as text files and the overhead of size and processing time is acceptable. Today databases are backed up to text files which, with some limitations, could be restored into any other SQL database system.

    Also we no longer store data on disks or optical drives since we ave learned that reading those is a very slow and therefore expensive process.

    Just compare the process of copying over a 1 hour video file from one harddisk to another to the elaborate task of playing a Quadruplex video tape (most common type of video tape till the 1980s).

    https://www.youtube.com/watch?v=zHDU1wXw1sU

  22. Terry Cloth
    Boffin

    Webpioneer my left eyetooth

    Mr Cerf is an Internet pioneer. Robert E. Kahn and he were inventing the Internet when Mr Berners-Lee had just decided on a college.

    Now get off my bandwith!

  23. ratfox Silver badge

    Saving stuff on the cloud was made for people like me

    I recently threw away my university memories.

    I bought a server with 1TB for the express purpose of safekeeping all the stuff I had when I left university eight years ago.

    The server has sat unused in a corner of my flat for the last three years; I'm not even sure it could still boot. When I moved to a new place last month, I just brought it to the garbage dump because I couldn't bother.

    I currently have a Synology for my backups. Hope it works better.

  24. Anonymous Coward
    Anonymous Coward

    Insurance

    I have a thousand pictures of me being a complete arse, I save one with every backup, I have found over the years no matter how hard you try you just can't erase those sort of things.

  25. GBE

    "digital lives wiped from history"

    "Webpioneer Vint Cerf has warned – once again – that our digital lives are in danger of being wiped from human history."

    Good god, I hope so. Has he _seen_ what comprises "our digital lives"? It's pretty much all cat videos, selfies, and tweets that should have been wiped from history before they were even posted.

  26. king of foo

    missed opportunity

    History is no longer written by the victor, but by the evil prankster with the glass writer...

    Facebook was an actual book compiled by international police containing millions of peoples' mugshots.

    The most famous painting, the moaning lisa, was stolen by a gang of cockney criminals driving 3 different coloured Volkswagen Beatles, and later recovered when they broke down.

    Etc.

  27. awood-something_or_another
    Trollface

    You've been trolled!

    Who gives a shit?!?!

  28. Wzrd1

    We already have digital vellum

    Google, Facebook and more keep data forever, even your family pictures, embarrassing pictures and your e-mails.

    If those fail, we still have the NSA, GCHQ and the rest of the "eyes". Getting a copy back from them, as easy as getting Google or Facebook to give a copy back...

  29. Harry Anslinger

    That's why we have the NSA

    The NSA is supposed to be capturing and storing the Internet for their surveillance purposes. We already have a mechanism for capturing and storing the bits - courtesy of the U.S. taxpayers. I welcome future electronic deletion - nature has it's change agents - we are all just dust in the wind.

  30. 0765794e08
    Thumb Up

    My Solution

    I was thinking about this very topic a couple of years back.

    I’ve been writing computer programs on and off (for fun, and at work too) since my schooldays in the eighties. I must have written ten of thousands of lines of code over the years. I totted up the number of different languages that I’ve coded in (mostly BASIC and variants thereof), and the total came to nine.

    Then it dawned on me that, years from now, no one would likely know of my lovingly crafted programs, much less be running them for any reason. All my digital creativity would be lost. I’d have no digital legacy.

    Then I had a brainwave. I’d make a piece of artwork out of my computer code. I took screenshots of snippets of nine program listings, in their respective IDEs - one for each of the languages. One was BBC Basic, one was Microsoft QuickBasic, one was a batch file, etc, etc.

    I used InDesign to layout the artwork in A4, with the nine screenshots sized proportionately on the page. This left space in the bottom right hand corner so I added in a digitally embellished photo of my childhood self. I finished off the ‘montage’ with a nice ‘circuit board’ themed bezel area.

    I printed out my masterpiece and mounted it in a silver frame. It is hanging, with pride of place, in my bathroom opposite my toilet. So now, whenever I’m doing a poo, I can look up at it and relish in my geekiness.

    It is my fond hope that, long after I’ve turned to dust, my computer artwork may survive and be pored over endlessly by historians of the age.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019