back to article 'Climategate' university to open up data

The University of East Anglia is to receive JISC funding for a project to open up its research on global warming to scrutiny and re-use. The university, which was at the centre of a scandal revealed by leaked emails from its Climatic Research Unit, will examine how best to expose climate data for re-use, make it easier for …

COMMENTS

This topic is closed for new posts.
  1. Wommit
    Paris Hilton

    So they've

    finally worked out that, if they're hiding their data and methods, then we think that they must have something dirty to hide.

    Paris, because she doesn't hide her dirty things.

  2. Chris Miller
    FAIL

    How to save £600,000

    In the spirit of our straitened economic times, I offer this free suggestion:

    1) ZIP all the files

    2) Post the output on Wikileaks

    Oh, wait ...

  3. Sam Liddicott

    not quite...

    "a perception of failure to do so has been taken by critics of mainstream climate science as an indication of unsound science"

    that should read:

    "a perception of failure to do so has been taken by critics of mainstream climate science as an indication of unsound scientists"

    Unsound scientists can do sound science, but you can never really know

  4. Eponymous Howard
    FAIL

    Um...

    ... "which was at the centre of a scandal "

    What scandal was that then? The one where three (count 'em) reports concluded there was no scandal?

    1. Ken Hagan Gold badge
      Troll

      Re: Um

      Nah. Probably the one where three reports all said that it wasn't within their remit to actually examine the controversial bits.

      1. perlcat
        Coat

        Um**2

        Wasn't Officer Barbrady on the panel? "Move along, nothing to see here."

  5. Max Jalil
    Pirate

    What a waste of £600,000

    for " exploring ways of making data and methods more openly available" - information that should be in the public domain anyway for peer review and transparency, and is unlikely to be any less accessible or the culture to be any less secretive after some pointless guilt money has been thrown at it.

    The money would be better spent on the hackers.

  6. Tigra 07
    Flame

    Coming Soon...

    In the news tomorrow: "Climate change exaggerated and green taxes wasted on paying John Prescott's food bill in the commons"

  7. Anonymous Coward
    FAIL

    Still trying to restrict access

    From a UEA announcement:

    "The UEA team, led by Dr Tim Osborn, is one of eight departments around the country who will be working towards models of better data management practice and making data more openly available for reuse in universities across the UK."

    Note that if they only make the data available to universities then this would still exclude some of their critics.

    In one of their responses to a FOI request they claimed they could not redistribute some of the data because they had agreements which meant they could not pass it on to non-academics. Further FOI requests revealed no such agreements existed.

  8. John Smith 19 Gold badge
    Coat

    Might start with documenting the software used

    Like what parameters the tools accept and their values.,

    And in what circumstances they replace the input data with stuff hard coded into the software.

    And what the file format structures actually *are*.

    £600k split 6 ways. Evenly that's £100k a site. 1 PhD for 8 years? Or some software *professionals* (preferably with experience of *large* data set management) for 6 months?

    If you can't measure it, or measure it but won't *explain* what you used (and how) to get your results it's an *opinion*.

    Mine will be the one with the PMP loaded with the harry-read-me files.

  9. Anonymous Coward
    WTF?

    "How best to expose climate data"???

    Just dump it all on a fucking ftp server. Why do these tossers need JISC funding for that?

    1. Sam Liddicott
      Big Brother

      because it's orewllian new-speak

      If they spend lots of money on making it available then they are also at the same time spending lots of money controlling how it is available, or in other words on making it un-available in the right way.

    2. Tom 13

      Re: FTP server

      This is one of the few areas where I am willing to grant a very, very small amount of leniency. Given the sizes of the data sets they OUGHT to have, I don't think it is as easy as just dumping it on an FTP server. They need to have redundancy and availability for the data, and they need to be able to ensure the integrity of the data.

      I do however concur with the sentiment that this smells more of continuing the cover-up of their politicization of science than of an actual change in attitude and methodology. So keeping a sharp eye on them continues to be a necessity.

  10. Gaius
    FAIL

    The operative word...

    ... is "funding". All aboard the gravy train, boys!

  11. AGirlFromVenus

    jisc funding

    With JISC funding they will spend a chunk of money on computers, employ somebody for the project duration, spend loads of money on project meetings and "outreach" and writing exit strategies, then it'll go down the pan when the funding is withdrawn, or maybe transfered (without any public tender process) to one of the core JISC cronies, oops, data centres.

  12. Wommit
    Pint

    Do I detect

    the tiniest amount of cynicism here?

  13. Graham Bartlett

    @AC and FTP servers

    Feckwit. Sure, you could dump umpty-tum gigs of data on your home PC and set up an FTP server. That doesn't mean it's a good idea.

    What's that Skippy? They're binary files for an in-house format which we need to get our jobs done more efficiently? And you're going to need to need a ton of work documenting exactly what's stored in each directory, so that critics don't start up with arguments based on the wrong set of data? And you say reliable, high-bandwidth, multi-site FTP servers cost real money? And it'll cost real money to set the machines up securely and keep them secure?

    What's that Skippy? Or we could have a kangaroo court, and anonymous cowards can criticise sensible decisions? Hell yeah, let's do it!

    1. Anonymous Coward
      Anonymous Coward

      Reply to post: @AC and FTP servers

      You're forgetting that these people work at a university. They should already have the staff and infrastructure in place to produce that kind of server with minimal effort. Another server on the rack shouldn't be that hard to deal with. OTOH, if they're actually going to do something meaningful, like make the bits and pieces readily referenceable in papers, then it's probably worth the money.

      1. Anonymous Coward
        Anonymous Coward

        @AC

        Yes, because universities size their infrastructure for oodles of Gb of bandwidth / GB storage over and above what their growth estimates are.

        Bandwidth and storage (incl backup) are about the most expensive aspects of modern IT.

        1. Anonymous Coward
          Anonymous Coward

          Re: Fraser

          Storage is dirt cheap and UK universities are connected to SuperJANET which will provide them with the bandwidth needed.

          They are not streaming video here, just making available data files that will downloaded once in a blue moon.

          1. Anonymous Coward
            Anonymous Coward

            @AC

            You are wrong. Storage may be dirt cheap if you buy some crappy consumer SATA drive from PC world, proper storage is still very expensive. Disaster recovery and backup is very expensive. Also, SuperJANET isn't free, the bandwidth used has to be paid for, even if a university has a dark fibre (or other dedicated) link to a concentrator site which has enough spare capacity to move data into SuperJANET, it still has to be moved onto the Internet and cost will be incurred.

            I've not even mentioned extra servers for hosting the data, software and maintenance contracts, etc. etc.

            It doesn't matter if the data is moved once in a blue moon, if it's not properly stored, protected and available at a reasnoble speed.

      2. Trevor_Pott Gold badge

        It's not just a matter of making available the raw data files.

        It is also a matter of making available the APIs required to properly use them. Here's a thought: who owns the intellectual property to that? Were the programs in use created by a proprietary company? Perhaps a good chunk of the money involved is actually going to purchasing the intellectual property rights to the API involved so it can be redistributed.

        There simply isn't enough information available about where the money is going to make any judgements about whether or not it is being improperly spent.

        1. John Smith 19 Gold badge
          Happy

          @Trevor_Pott

          "It is also a matter of making available the APIs required to properly use them. Here's a thought: who owns the intellectual property to that? "

          From my admittedly cursory read of the harry-read-me file most of it seems to bespoke code written in (*really* badly documented) FORTRAN and c. I'm not sure there *is* much of an API as lots of this stuff seems to run with command line switches (undocumented unless you read the source) or fully interactively at a terminal.

          *Some* of it seems to have been done in "IDL". *Not* the thing used to define web services but a proprietary language hosted on DEC VAX boxes under VMS. The language also seems to use some proprietary data formats to hold intermediate results and it does not look like *anyone* is rushing to form a community to build an open source version of it.

          Hope that helps.

          1. Anonymous Coward
            Anonymous Coward

            @John

            IDL is available on unix (and possibly linux) but I strongly suspect that the code will have heavy use of the Met Office specific IDL routines. I can't see the MetOffice giving away these routines, what with them being the military and fairly protective about that sort of thing.

            1. John Smith 19 Gold badge
              Happy

              AC@12:58

              "but I strongly suspect that the code will have heavy use of the Met Office specific IDL routines. "

              Possible but (again from my reading of harry-read-me) the comments indicate they were written by someone within the Centre. However weather they wrote them or cut n pasted out of a Met Centre archive is another matter.

              That a publicly funded *civilian* research centre should be borrowing from MoD software would be another sign of *very* poor development practices. that would *definitely* be another layer of obscurity into the process of going from raw data to conclusions.

              While it *might* be the case these data tools are the very best available I've long learned that just because it *might* be Secret doesn't mean it's actually any good.

  14. No, I will not fix your computer
    Boffin

    9 out of every 10 cats

    All the numpties that think that the "data" is just a bunch of text files in csv should crawl back in their holes.

    Not only is the data in specialised formats from disparate sources, the statistical analysis used is very specific, some of which are subjects of individual theses, there's layers and layers of complex work from hundreds of people, you genuinely need to have studied at degree, masters and doctorate levels to understand some of the raw forms, the problems at the UEA were not because data was hidden or misinterpreted it just couldn't be easily understood and therefore assumed to be a cover up (I was at the UEA the other day as my partner was presented with her degree) their only mistake was to be a load of propeller heads that couldn't explain (in simple terms) how they got some of the conclusions.

    1. Tom 13

      Security by obscurity doesn't work for programming,

      works even less well with science. Spread your FUD elsewhere.

      1. Anonymous Coward
        Anonymous Coward

        @Tom

        This isn't security through obscurity, it's nothing to do with security. The data that come from many sources are in propitiatory binary formats for several reasons - age and the lack of any other formats when the datasets were created, requirement for highly efficient data formats (for getting data from satellites etc.) are just off the top of my head.

        I daresay that if the data weren't made available in its original format, they'd be being accused of messing with it when it was converted to whichever modern format they chose to publish it in.

    2. John Smith 19 Gold badge
      Happy

      @No, I will not fix your computer

      "All the numpties that think that the "data" is just a bunch of text files in csv should crawl back in their holes."

      Quite true. Indications are the raw data is a hodge podge of large, undocumented, data files lacking even something as basic as a systematic set of naming conventions processed through a bunch of poorly structured undocumented software that *may* prove the case that AGW is real or then again that the human race died out 200 years ago.

      "the problems at the UEA were not because data was hidden or misinterpreted it just couldn't be easily understood and therefore assumed to be a cover up"

      No the problem was a *publicly* funded research institute whose *core* asset was a set of *very* large datasets and whose core *product* were the analysis (and the tools to conduct that analysis) have been shown to have data management skill inadequate for a 10 year old to keep track of their Pokemon card collection and software development practices which would have put any most (all) of the professional developers here on the street within their first month at most.

      If it were a privately funded group studying arguments about who really wrote Shakespear's plays no one would care.

      It is not. When you're discussing something that will cost *billions* to deal with this level of shoddy work is grossly unprofessional and unacceptable.

      The physics and chemistry *are* complex. The failure to handle *basic* data management and software quality assurance (which is *critical* to what was done *with* the data) makes the rest fairly irrelevant. People might need a PhD to understand the science, but they don't need one to understand GIGO.

  15. Michael 17
    Paris Hilton

    Quick, call Andrew Orlowski!

    The fact that they're spending money to figure out how to publish their datasets is clearly part of some kind of corrupt scheme to continue their climate "science" fakery while lining their pockets with public money! After all, we know climate science is just a con job, and so there's no chance they could actually be trying to open up their data and allow both legitimate researchers and anti-science corporate hatchet-men access!

    Paris, because she knows how things really heat up.

This topic is closed for new posts.