So they've
finally worked out that, if they're hiding their data and methods, then we think that they must have something dirty to hide.
Paris, because she doesn't hide her dirty things.
The University of East Anglia is to receive JISC funding for a project to open up its research on global warming to scrutiny and re-use. The university, which was at the centre of a scandal revealed by leaked emails from its Climatic Research Unit, will examine how best to expose climate data for re-use, make it easier for …
"a perception of failure to do so has been taken by critics of mainstream climate science as an indication of unsound science"
that should read:
"a perception of failure to do so has been taken by critics of mainstream climate science as an indication of unsound scientists"
Unsound scientists can do sound science, but you can never really know
for " exploring ways of making data and methods more openly available" - information that should be in the public domain anyway for peer review and transparency, and is unlikely to be any less accessible or the culture to be any less secretive after some pointless guilt money has been thrown at it.
The money would be better spent on the hackers.
From a UEA announcement:
"The UEA team, led by Dr Tim Osborn, is one of eight departments around the country who will be working towards models of better data management practice and making data more openly available for reuse in universities across the UK."
Note that if they only make the data available to universities then this would still exclude some of their critics.
In one of their responses to a FOI request they claimed they could not redistribute some of the data because they had agreements which meant they could not pass it on to non-academics. Further FOI requests revealed no such agreements existed.
Like what parameters the tools accept and their values.,
And in what circumstances they replace the input data with stuff hard coded into the software.
And what the file format structures actually *are*.
£600k split 6 ways. Evenly that's £100k a site. 1 PhD for 8 years? Or some software *professionals* (preferably with experience of *large* data set management) for 6 months?
If you can't measure it, or measure it but won't *explain* what you used (and how) to get your results it's an *opinion*.
Mine will be the one with the PMP loaded with the harry-read-me files.
This is one of the few areas where I am willing to grant a very, very small amount of leniency. Given the sizes of the data sets they OUGHT to have, I don't think it is as easy as just dumping it on an FTP server. They need to have redundancy and availability for the data, and they need to be able to ensure the integrity of the data.
I do however concur with the sentiment that this smells more of continuing the cover-up of their politicization of science than of an actual change in attitude and methodology. So keeping a sharp eye on them continues to be a necessity.
With JISC funding they will spend a chunk of money on computers, employ somebody for the project duration, spend loads of money on project meetings and "outreach" and writing exit strategies, then it'll go down the pan when the funding is withdrawn, or maybe transfered (without any public tender process) to one of the core JISC cronies, oops, data centres.
Feckwit. Sure, you could dump umpty-tum gigs of data on your home PC and set up an FTP server. That doesn't mean it's a good idea.
What's that Skippy? They're binary files for an in-house format which we need to get our jobs done more efficiently? And you're going to need to need a ton of work documenting exactly what's stored in each directory, so that critics don't start up with arguments based on the wrong set of data? And you say reliable, high-bandwidth, multi-site FTP servers cost real money? And it'll cost real money to set the machines up securely and keep them secure?
What's that Skippy? Or we could have a kangaroo court, and anonymous cowards can criticise sensible decisions? Hell yeah, let's do it!
You're forgetting that these people work at a university. They should already have the staff and infrastructure in place to produce that kind of server with minimal effort. Another server on the rack shouldn't be that hard to deal with. OTOH, if they're actually going to do something meaningful, like make the bits and pieces readily referenceable in papers, then it's probably worth the money.
You are wrong. Storage may be dirt cheap if you buy some crappy consumer SATA drive from PC world, proper storage is still very expensive. Disaster recovery and backup is very expensive. Also, SuperJANET isn't free, the bandwidth used has to be paid for, even if a university has a dark fibre (or other dedicated) link to a concentrator site which has enough spare capacity to move data into SuperJANET, it still has to be moved onto the Internet and cost will be incurred.
I've not even mentioned extra servers for hosting the data, software and maintenance contracts, etc. etc.
It doesn't matter if the data is moved once in a blue moon, if it's not properly stored, protected and available at a reasnoble speed.
It is also a matter of making available the APIs required to properly use them. Here's a thought: who owns the intellectual property to that? Were the programs in use created by a proprietary company? Perhaps a good chunk of the money involved is actually going to purchasing the intellectual property rights to the API involved so it can be redistributed.
There simply isn't enough information available about where the money is going to make any judgements about whether or not it is being improperly spent.
"It is also a matter of making available the APIs required to properly use them. Here's a thought: who owns the intellectual property to that? "
From my admittedly cursory read of the harry-read-me file most of it seems to bespoke code written in (*really* badly documented) FORTRAN and c. I'm not sure there *is* much of an API as lots of this stuff seems to run with command line switches (undocumented unless you read the source) or fully interactively at a terminal.
*Some* of it seems to have been done in "IDL". *Not* the thing used to define web services but a proprietary language hosted on DEC VAX boxes under VMS. The language also seems to use some proprietary data formats to hold intermediate results and it does not look like *anyone* is rushing to form a community to build an open source version of it.
Hope that helps.
"but I strongly suspect that the code will have heavy use of the Met Office specific IDL routines. "
Possible but (again from my reading of harry-read-me) the comments indicate they were written by someone within the Centre. However weather they wrote them or cut n pasted out of a Met Centre archive is another matter.
That a publicly funded *civilian* research centre should be borrowing from MoD software would be another sign of *very* poor development practices. that would *definitely* be another layer of obscurity into the process of going from raw data to conclusions.
While it *might* be the case these data tools are the very best available I've long learned that just because it *might* be Secret doesn't mean it's actually any good.
All the numpties that think that the "data" is just a bunch of text files in csv should crawl back in their holes.
Not only is the data in specialised formats from disparate sources, the statistical analysis used is very specific, some of which are subjects of individual theses, there's layers and layers of complex work from hundreds of people, you genuinely need to have studied at degree, masters and doctorate levels to understand some of the raw forms, the problems at the UEA were not because data was hidden or misinterpreted it just couldn't be easily understood and therefore assumed to be a cover up (I was at the UEA the other day as my partner was presented with her degree) their only mistake was to be a load of propeller heads that couldn't explain (in simple terms) how they got some of the conclusions.
This isn't security through obscurity, it's nothing to do with security. The data that come from many sources are in propitiatory binary formats for several reasons - age and the lack of any other formats when the datasets were created, requirement for highly efficient data formats (for getting data from satellites etc.) are just off the top of my head.
I daresay that if the data weren't made available in its original format, they'd be being accused of messing with it when it was converted to whichever modern format they chose to publish it in.
"All the numpties that think that the "data" is just a bunch of text files in csv should crawl back in their holes."
Quite true. Indications are the raw data is a hodge podge of large, undocumented, data files lacking even something as basic as a systematic set of naming conventions processed through a bunch of poorly structured undocumented software that *may* prove the case that AGW is real or then again that the human race died out 200 years ago.
"the problems at the UEA were not because data was hidden or misinterpreted it just couldn't be easily understood and therefore assumed to be a cover up"
No the problem was a *publicly* funded research institute whose *core* asset was a set of *very* large datasets and whose core *product* were the analysis (and the tools to conduct that analysis) have been shown to have data management skill inadequate for a 10 year old to keep track of their Pokemon card collection and software development practices which would have put any most (all) of the professional developers here on the street within their first month at most.
If it were a privately funded group studying arguments about who really wrote Shakespear's plays no one would care.
It is not. When you're discussing something that will cost *billions* to deal with this level of shoddy work is grossly unprofessional and unacceptable.
The physics and chemistry *are* complex. The failure to handle *basic* data management and software quality assurance (which is *critical* to what was done *with* the data) makes the rest fairly irrelevant. People might need a PhD to understand the science, but they don't need one to understand GIGO.
The fact that they're spending money to figure out how to publish their datasets is clearly part of some kind of corrupt scheme to continue their climate "science" fakery while lining their pockets with public money! After all, we know climate science is just a con job, and so there's no chance they could actually be trying to open up their data and allow both legitimate researchers and anti-science corporate hatchet-men access!
Paris, because she knows how things really heat up.