...finally got to the ROOT of the problem...
Someone had to say it.
Tens of thousands of bugs have been eliminated from the program CERN's atom-smashers are using to identify Higgs boson – just don't expect an answer to life the universe and everytime anytime soon. CERN says it has squashed 40,000 bugs living in ROOT, the C++ framework it is relied upon to store, crunch and help analyse …
...finally got to the ROOT of the problem...
Someone had to say it.
So they didn't even use LINT. My experience of scientist's code compared to professional programmers code is that the scientists code is extremely sloppy and slap dash. The equivalent of a someone building their first house and a professional builder building their umpteenth house.
and someone who writes LOTS of terrible code I generally agree.
I have however met some great programmers who were once scientists
Since they are using C++ my guess is they probably have at least a few true programmers (as opposed to scientists that think they can code). If not they would have used Fortran. I would never take a job though maintaining any complex program written by a physicist though.
You don't want a flustered Doc Brown after you. A double PhD has its prerogatives.
It gets even weirder when you find scientists doing Network Maintenance and Security on the side.
Slightly polarised for clarity:
The job of a scientist is to do science - developing theories and models and writing code to back them up or prove the concept. Programs written may not even be intended to be used by anybody but the original author and their shelf-life is "until the concept is proved", at which point they drop the code and move onto their new task.
The job of a professional programmer is just that: writing programs to a professional standard. Since this is what they do (roughly) all day, every day, they can get pretty good at it. Their programs are intended for use by other people and the shelf life is "until my company decides not to support it."
Yes, some scientists do write some awful code (but so do some professional programmers), but the two groups write programs for different reasons so a direct comparison seems invalid.
(I write this as a PhD student who writes software for a living).
"The job of a scientist is to do science - developing theories and models and writing code to back them up or prove the concept. Programs written may not even be intended to be used by anybody but the original author and their shelf-life is "until the concept is proved", at which point they drop the code and move onto their new task."
The attitude of the staff of the Climate Research Unit, which gave us the memorable reading experience that was the harryreadme file.
Hint. If *your* kind of science can be done by writing a dozen equations on a blackboard great. The software is merely a quick and dirty demonstrator of your theory and you can write it any way you like.
However *if* your kind of science involves crunching multiple *large* data sets (and the *way* you process them) is *critical* to your whole thesis (and/or will be used to make *billion* pound policy decisions) your process should be substantially more methodical.
The c++ code (not ROOT) on LHC experiments has to run day in day out over and over again processing the data and is written to a much higher standard than the typical academics code - most of the code in ROOT is only used in private code used by 2 or 3 individuals at a time for data analysis, not "mission critical" stuff (although a small part of ROOT is used for data persistification to file). The people writing the code that is used to process the data do write code day in day out and are pretty good at it - this is often 50%+ of their job, even though they trained as scientists in most cases. The rest of them are perhaps as you suggest ;) - I've seen lots of awful coding in LHC experiments.
Why so mach asterisk bracketing sir? and are you being sarcastic or emphatic? because either way that much sarcasm or emphasis on random words derails the little voice in my head that I read with.
Clifford Stoll, wasn't it? But if you get a scientist problem-solving on something new, it's hard to tell where engineering takes over from science.
that professional programmers aren't the only ones who have serious issues with Dr-Scientist types.
Since you did me the courtesy of leaving your name on your comment let me explain.
"Why so mach asterisk bracketing sir?"
At the risk of being obvious because there several points I want to emphasise.
Not everyone here speaks English as a first language (and I sometimes doubt some posters have Earth as a first planet either).
"on random words derails the little voice in my head that I read with."
At the risk of a humty dumpty answer the words I emphasise make perfect sense to me.
Now *apart* from my commenting style do you have a substantive comment to make?
Dijkstra: Testing shows the presence, not the absence of bugs
...a small wager on this?
Gravity, and it's twisted twin centrifugal force, are mediated by the structure of the photon and the nature of spacetime. There is no Higgs Boson. Sorry.
At least beer was detected earlier.
They get everywhere:
> Do does this now mean ROOT is bug-free?
No. Perhaps spacing keyboard keys further apart would stop people bashing in the wrong characters in there code?
Thrown off the scent of the Higgs??!?! By a ROOT bug?
Let's definitely state that ROOT has bugs and design problems... but no, changing physics results is indeed a bridge too far.
Python does all the real work at CERN. They'd use QBasic if they could submit the jobs to the grid.
But I was seriously under the impression that they used C (not C++) for most of the code there.
I know they use vast MySQL farms as well ... all on Linux.
I can say on one of the main experiments we use C++ which is configured at run time using python. Certainly no C in sight....
Well Python is probably preferable to C++, even if it is still imperative. I can't think of a language much less suited to having hundreds of non-programmers collaborating. It's just so dangerous. How on earth can they do proper unit tests? I bet as soon as any particular experiment comes to an end the associated code gets binned straight away due to rot. A better approach would have been to get them trained and using ML from the start.
Please, do not hide behind bugs. How could you reconfirm the rest of the SM with all the bugs crawling around? Do not sound like a Gaddafi in Physics -- finding newer and newer excuses for something that is not there.
Am I the only one who read the title as "CERN's BOSOM HUNTERS...."?
(Smirk/snigger icon please)
Is it just me or does everything we hear from CERN make it sound as though the existance of the Higgs Boson is absolutely certain even though they haven't yet been able to prove it? Until they find it doesn't the possibility exist that there is no such thing as a Higgs Boson?
The Standard Model Higgs (and indeed, there are others, like Supersymmetric Higgs) is just a mathematical consequence of the Standard Model. We know the latter is not a 100% correct description of physics, so it might well prove to have a hole where the Higgs should be.
Indeed, by end of the year there might well be enough data to exclude the Standard Model Higgs "at 95% confidence level". That would make things interesting.
More here: http://www.math.columbia.edu/~woit/wordpress/?p=3960
My experience of C++ -- which goes back to the early 90s -- is that's its a very powerful tool that's responsible for pretty much every large scale screwup in modern software design (plus the inevitable software bloat). I describe giving a typical programmer this tool as "a bit like giving a toddler a chainsaw as a Christmas present". I also think that object methodology is seriously overused; its all that gets taught so we're stuck with the "if all you've known is a hammer then everything looks like a nail".
Now, rather than making the typical programmer statement "its buggy because its got x million lines of code in it" we should be asking why its so big, why it doesn't break down into testable components and so on. Ordinary, everyday stuff that I will admit seems to be elusive to our Windows bretheren (Microsoft doesn't go out of their way to make their stuff easy to work with, IMHO) but absolutely essential if you're doing serious work such as embedded design.
I find professional programmers -- CS majors -- among the worst offenders because they only know their coding abstractions, they see the code as the goal rather than it being a model of some thing or process.
Well, at the time ROOT development was started, writing a major scientific package at CERN in C++ instead of Fortran 90 was a controversial choice, which surprised me, as an observer in the next building, more than somewhat. My guess is that, yes, OO programming in an unsafe language like C++ would be more error-prone that subroutine-based structured programming in Fortran 90; but if you knew the individual mostly involved in originating ROOT, you would know that making this argument at the time was doomed to failure. The fact is that if ROOT hadn't been written in C++, it would never have been written at all. The modern generation of physics experiments would be using some completely different software for event reconstruction. I expect it would have a different set of 10000 bugs.
ROOT is not used in event reconstruction in e.g. ATLAS though.
IMO the problems with ROOT are not the language used (C++), but the way it is written and designed. Its perfectly possible something much better could have been written in C++.
"Microsoft doesn't go out of their way to make their stuff easy to work with, IMHO"...
Have you taken a look at F#? Fabulous language - OCaml on interoperability steroids and a real programmer's tool. Not everything that comes out of Microsoft is rubbish. (Just a shame you have to rely on Mono to use it on a proper OS.)
I'm looking at the timescales here, and wondering what you could have picked from. I suspect, for one thing, that if Windows was even involved, it was NT, and the reality was mostly some form of Unix.
And you wanted something that, as a programming environment, would be available for a long time.
The world looks a bit different now. I had a few megabytes of storage then. Now we talk in terabytes. I reckon they didn't so so badly, if they only found 40,000 bugs.
>Microsoft doesn't go out of their way to make their stuff easy to work with, IMHO
Wow so most people's grannies run winblows because they like compiling drivers eh? M$ has a shedload of flaws but in general they are kings of making their software idiot friendly.
Probably that's why it's mainly used by idiots !
No they aren't. Windows is a mess for the uninitiated. For people touching a computer for the first time it's the absolute worst OS choice out there. It requires the most maintenance and has the least intuitive interface of any OS I've ever had the displeasure of using.
...The same software framework used for the neutrino experiment that seems to show the speed of light is not an absolute?
When they have finished using scarce resources to look at esoteric particles, can we use the CERN for f1 racing?
Rather strangely I am a particle physicist who worked at CERN and I now with a Formula 1 racing team.
What do you mean? If you're talking about using the LHC tunnel to race vehicles through, then sorry to say it is (a) too small (b) probably not ventilated enough.
That C++ is dangerous that C++ is like a chainsaw, etc.
Any language can engender a bloody mess, or a fine piece of digital crafting.
Being it VB6, .NOT, Delphi, Perl, Bash,or C/C++
I have seen software being written in VB6 which was both well designed and elegant, and I have seen true C++ abortions of nature, and the other way around.
It is the developer who makes the difference here.
Am I the only one thinking "Sounds about right"?
Is it just me or is this article just a Coverity press release? I'd guess that most of the 40000 "bugs" are potential problems flagged by Coverity rather than actual bugs.
It would be interesting to know what proportion of these defects are of the kind "did not check value returned by printf statement" - i.e. something which should be done but often turns out to have no practical impact on the actual use of the program being assessed.
here's what I want to know, why didn't anyone investigate the "false positives" and "correct" them. I have found many problems in porting F/LOSS code to other platforms that once you clean up all the "warnings" seem to magically disappear.
All programers should have this drilled into their heads, "Compiler warnings ARE bugs."
"Coverity Static Analysis leverages the most innovative, sophisticated and patented techniques to help you find bugs that are difficult, if not impossible, to find by other means."
is a pile of utter rubbish. It is terribly written, and it's continued use is holding back progress.
The developers don't know a decent inheritance structure from their elbow, and have only grasped the concept of name spaces.
The program encourages the mixing of analysis code and graphical representation code which leads to utter confusion.
Half the methods aren't implemented, the cl;asses are terrible. Everything inherits from a generic TObject.... The have invented their own scriping 'language' CINT, which
Memory management and garbage collection are alien words to these developers, and they perfer to delete / keep 'ROOT' objects at the end of functions, not allowing for scope of ownership.
Have a look at the ROOT source.....
// For now explicitly disable copying into the value (i.e. the proxy is read-only).
#define private public
and yes, I am a particle physicist at CERN.
Don't let them fool you into thinking running coverity is a success... is a flipping nightmare...
Hm... Could such bugs cause tiny mistakes, of the order of nanoseconds perhaps? hehe
I'm out of here faster than the speed of light.
...CERN has so much data that nobody can look at it. They therefore filter and sort the data based on their preconceived concepts. So their search is bounded by their current theories.
It's a perfectly valid point.
CERN should be working on some generalized data exploration tools and put it all on the Internet. Call it CERN-Zoo.
Alex24 is spot on. For such a widely-used tool, ROOT is quite spectacularly badly designed. The thing is, it was always known to be such. CERN's computing division bigwigs had a massive fight with the ROOT originators in the late 90's and they were cast into the wilderness for several years. Unfortunately for the field of particle physics and everyone working in it, CERN's "official" attempts to provide new data analysis tools were cack handed and under-resourced, and people started using ROOT because they literally didn't know better - all they'd had before was home-grown Fortran 77 libraries and scripting languages. After a couple of years CERN's management realised they had lost the war and ROOT was finally anointed as the official tool of high energy physics analysis. We will be living with this unmaintainable, untestable embarrassment for yearsto come. (Anon for obviuos reasons)
fscked by SHA-1 collision? Not so fast, says Linus Torvalds