back to article Creaky NHS digital infrastructure risks holding back gene boffinry, say MPs

The state of the NHS's digital infrastructure and a lack of clear budgets risk holding back the UK’s efforts in genomic medicine and research, MPs have said. In a report, Genomics and Genome Editing in the NHS, The House of Commons Science and Technology Committee criticised a lack of investment in the kit, training and public …

  1. Solarflare

    Great idea. But first, could the digital infrastructure be put in place so my GP can get info from my Consultant and vice-versa, especially if I have moved across the country? It would be nice if the 'digital infrastructure' could be worked on so that it doesn't take 9 weeks to get a letter typed up just to say that I have had an appointment. It would be even better if I could get letters telling me that my appointment has been rescheduled for x date, but was only dated the week prior to that and arrived 3 days the appointment.

    But yes, Genomics, cool.

  2. Anonymous Coward
    Anonymous Coward

    Cheese Ripening

    My company brings in pallets stacked high with brand new PCs, to replace the 10+ year old crap that we're still trying to use. They store the new hardware away. Months pass. Nothing. Apparently it's complicated.

    IT following the ancient tradition of Cheese Ripening.

    A government department that I was familiar with did the exact same thing. At least two years of storage. Hundreds of PCs neatly stacked up in a closet for several years of 'Cheese Ripening'.

    Somebody should offer this as a service. Store your brand new hardware on the endless shelves in our deep and secure former mine.


    Is it just me? Or is this common in industry?

  3. David Harper 1

    200GB to store a genome? Surely not!

    A human genome is 3.2 billion base pairs, and each base pair can be represented as a single letter from the set {A, C, G, T} so even if you use (a wasteful) one byte per base, that's still only 3.2 GB. If you encode a base in the most efficient way, as two bits, you can reduce the size to 800 MB. So where does 200GB come from?

    (I work in bioinformatics and I've written large-scale applications that store entire genomes, so I have a bit of experience in this field.)

    1. Dr. G. Freeman

      Re: 200GB to store a genome? Surely not!

      Maybe, they're using some molecular modelling software (like Gaussian) that's got co-ordinates for each of the atoms in each nucleotide causing the file to bloat ?

      Don't know, just an NMR boffin.

      1. David Harper 1

        Re: 200GB to store a genome? Surely not!

        No, this is just plain DNA sequencing data, which is strings of A, C, G and T.

        1. handleoclast Silver badge

          Re: 200GB to store a genome? Surely not!

          No, this is just plain DNA sequencing data, which is strings of A, C, G and T.

          Maybe they're storing them as actual strings. Not in ASCII. Not in UTF-8. But in UTF-32 (aka UCS-4). Four bytes per character.

          You're about to retort that such an encoding would be incompetent. We're talking about the NHS here...

          1. Anonymous Coward
            Anonymous Coward

            Re: 200GB to store a genome? Surely not!

            From what I remember the genome files (VCF format) are anything up to 125GB. They're basically compressed CSV files, but hell they are huge.

    2. Lee D Silver badge

      Re: 200GB to store a genome? Surely not!

      What happens if you PKZIP it?

      1. Alistair Silver badge

        Re: 200GB to store a genome? Surely not!

        What happens if you PKZIP it?

        The NSA ends up with a copy of your genome.

    3. archmagenmr

      Re: 200GB to store a genome? Surely not!

      If I remember correctly, the whole genome sequencing is carried out at a depth of at least 32x (some go up to 40x). That means that each DNA is sequenced 32-40 times, which is necessary to ensure a modicum of confidence (statistically speaking, of course) because the reads are short - 200-400 bases each. The 'whole sequence' is actually assembled from all those short reads by stitching them together through overlapping sequences. Then you have two strands for each DNA duplex, which are individually sequenced as a second safeguard against random errors in the sequencing reaction. So that's 3.2 billion base pairs times 32 (depth) times 2 (DNA strands, or 'read 1' and 'read 2' in NGS jargon), which comes close to 200 GB per person.

      Of course, that's for the Illumina platform. Pacific Biosciences machines are a different story.

    4. Korev Silver badge

      Re: 200GB to store a genome? Surely not!

      I'm guessing they're store the .fastq files. Hopefully if they're obliged to keep them then hopefully they archive them off onto something cheap. Unless they want to do some huge realignment etc. I doubt anyone would notice.

      On saying that though, due to various rules we have a lot of gzipped .fastq files sitting around which are not allowed to be converted to something like CRAM.

  4. JeffyPoooh Silver badge

    With that much Genetic Code on file...

    ...It's amazing that the OS can resist the overwhelming urge to load the code and try to execute it.

    Modern OSs are so promiscuous that they'd probably try to compile and execute the written code snippets if you took a picture of the blackboard at your Comp Sci class.

    1. Korev Silver badge
  5. John Smith 19 Gold badge

    Oh no, of course we won't use peoples DNA profile to set their insurance rates say insurers.

    Like f**k they won't.

    Less of those "gentleman's agreements" and actual legislation on the matter.

  6. This post has been deleted by its author

  7. David Glasgow

    200 Gigs true


    I don't understand it beyond the general principles.

  8. sad_loser

    This is blue sky science

    And none the worse for that.

    The tension here is that government/ NHS will mess up the development of this but that pure industry - insurers and pharma - will use it to rip off the IP and the population, and the money goes overseas. Once the data has been copied, its value decays exponentially.

    Pharma ought to be the best choice but the latest example of price gouging is holding the government to ransom over a cystic fibrosis drugs at £100k a year for marginal (non curative) benefit.

    Better to commission an institution set up between uk academics and industry to manage the IP generated, and keep the data locked up tight.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019