Google has released a research paper closely comparing the performance of C++, Java, Scala, and its own Go programming language. According to Google's tests (PDF), C++ offers the fastest runtime of the four languages. But, the paper says, it also requires more extensive "tuning efforts, many of which were done at a level of …
>> C++ "also requires more extensive "tuning efforts, many of which were done at a level of sophistication that would not be available to the average programmer"
But the question is, did tuning programs in the other languages improved them over the stock C++ version? If not, then it doesn't really matters that C++ is hard to optimize, when you get the speed virtually for free.
But concurrency in C or C++ is a real bitch. C++ when it comes to it is down right fugly.
And Java is a real bitch too... One of my pet peeves - I hate it when the GC cuts in just when you don't want it to and slows things down. But it is slightly more elegant than C++.
Scala looks interesting but I've not the time for the moment.
I'll stick with C and C++ and bitch about that instead.
No use putting go-faster stripes on your family van
"But the question is, did tuning programs in the other languages improved them over the stock C++ version? If not, then it doesn't really matters that C++ is hard to optimize, when you get the speed virtually for free."
No. The tradeoff is:
Java may be slower and have larger memory footprint
you get rid off the C++ "writing time" memory management problems, debugging efforts and all-around shoot-yourself-in-foot possibilities. The skillset needed is also lower [concomitantly, the "do not interrupt me now" requirement is weaker], which, believe me, is a _very_ good thing.
Ok, back to writing servlets in Groovy.
This is what is annoying though. Big companies and individuals look at C++ and then list its weaknesses before embarking on creating yet another language instead of improving C++ and moving things forwards.
Improving C++ improves games and productivity software which are never going to be written in a language that requires a VM or doesn't compile to machine code.
You have heard of RAII?
If you know how to write in C++ then you know how to write in C++.
I'm quite good at coding in C++ - I do not assume that ability and the methods I use carry over to, say, Java.
"But the question is ..."
It's only a question for those who didn't read the paper.
"If not, then it doesn't really matters that C++ is hard to optimize, when you get the speed virtually for free."
If 1 == 2, then all sorts of things follow.
Is this strictly humans tweaking it, compiling a la "-O3," all of the above...?
that would be
Very limitating for C++ . I know C++ programmers who don't use compiler optimisations, because it would slow down their code. That's'because they have a deep understanding of the language architecture at the machine level.
I want to point out that C is even better for this, because the language is infinitly more simple.
With higher level language, specially the ones that run on top of a virtual machine, it is simply impossible.
To answer your comments, the article said that the C++ program were optimized beyond the capabilities of average programmers. I guess the average C++ guy can do \O3.
Re: that would be
"I know C++ programmers who don't use compiler optimisations, because it would slow down their code. That's'because they have a deep understanding of the language architecture at the machine level."
Er no. That's because they are C programmers. The C++ standard library is made up of templates that are written to be as broadly applicable as is feasible. Consequently, they require function inlining, constant propogation and the removal of unreachable code to even get close to acceptable performance levels.
Even in pure C, it would be somewhat heroic to write code that was already as good as the optimiser can do for free, and vanishingly unlikely that you could beat a modern compiler on a large body of code. (Even where you can, it is *then* unlikely that you couldn't do better still by dropping down to assembly language for that hotspot.) Perhaps your friends made some measurements about 30 years ago and haven't revisited their assumptions since.
I agree for most but
Not all. You're right about C.
You're wrong about compilers.
These observations are made on actual products.
See Trustleap.com for more info.
And I am not tied with Trustleap nor their products by any means.
Compiler Optimizations make faults in production systems harder to debug
An ISV I previously worked for didn't optimize a lot of their C/C++ code as it was difficult to debug using a core file from a customer's critical enterprise production system.
When debugging a production problem, time is of the esscence. You also have the issue of having to be familiar with different debuggers/compilers for each platform too.
Sure C/C++ could be faster than Java, but Java has a lot of benefits, such as security built-in, less chance of hanging yourself, platform portability, tools common across platforms, huge availability of libraries. Java is not always suitable and C/C++ sometimes can be justified.
Looks like snake oil to me. They claim that they are over 5 million times more efficient than IIS in serving web pages, but also that they aren't bottlenecked on the CPU or network. Assuming a network pipe measured in Gigabits, that means that IIS cannot manage more than a few kilobits of network traffic under any circumstances.
I know this is IIS we're talking about, but that sounds a little harsh.
Your statement could only be made ...
... by someone who knows nothing about how compilers work and has never benchmarked their code. I grew up on assembly language and wrote C for 30 years and I know for a fact that your statement is nonsense because I can't control the details of the generated code unless my C program consists entirely of asm statements.
You also know nothing of how virtual machines with JIT compilation work. And finally, you know nothing of how higher level languages like Scala make it more feasible to use much more efficient algorithms.
Yay for irrelevant benchmarks.
Lost the plot
I really think that is where they lost it (Microsoft anyway), in the complete mess of libraries. C#'s biggest advantage was not pointer abstraction or garbage collection, but a huge Java-like framework that let you do anything with a supporting class, from image codecs to network access.
I've seen LLVM, provably secure compilers, that company that has a hardware "VM" for Java, protected mode processors, etc. I am looking at it now going, "what the hell is so great about a VM you can't do in a real processor?"
C++ had its faults, but it wasn't the language, it was the libraries. Lets just circle back round to 20 years ago and catch up.
Re: Lost the plot
return MessageBox(NULL,"Hello World!","",MB_OK);
Ah the old WTWSDNMS bottleneck
Wishing Things Were Simpler Does Not Make It So.
No computer language can make the problem you are trying to solve any less complicated. A language that makes certain aspects of problem solving 'easier' will just mean you spend more time hoping the problem you are trying to solve consists only of those things your language has 'simplified'.
Think of it a bit like weightlifting - lifting 5kgs is easy but if you have to lift 1000kg then you are going to have to make 200 lifts. If you practice a bit so you can lift 100kg in a go then it only takes 10 lifts. 'Modern' languages encourage you to leave the weights alone and go and stand on the sides of the running machine watching a video instead.
Perhaps when you have a refrigerator to move you put in some time at the gym first. Myself, I'd be more inclined to find a two-wheeled dolly, maybe one with a strap to secute the load (an "appliance jack" as they called them when I was a skinny stockboy).
But I do admire those who lift their weights while standing in the middle of the running machine...
When *comparing* programming languages,
that's equivalent to saying that no computer language can make the problem you are trying to solve any more complicated, which is obviously not true.
"'Modern' languages encourage you to leave the weights alone and go and stand on the sides of the running machine watching a video instead."
This is the sort of statement that someone who wrote assembly language on 24x80 terminals 40 years ago but hasn't written a line of code since might make.
How did they find multiple C++ experts?
I mean there are maybe about10 real C++ experts around. The rest are people who believe they know C++, but instead only know a subset of an older version of C++ thus being likely to fall into one of the many pitfalls.
The main problem is that people tend to believe that C++ is a high-level object oriented language when it's instead just a macro assembler with a really strange syntax. Mind you, it would only be half the problem if the standardisation people would get it. Instead they add more and more non-orthogonal features every few years.
I wonder why nobody talks about OOPascal anymore. There are now several free compilers around. It's fast and has just the features you need for C++-style OO. Most importantly some misfeatures like implicit object copies have been removed. The := operator only copies a handle to the object, not the object itself.
There's even a platform independent GUI toolkit coming with it. It even looks native on every platform.
The market, Chris...
"I wonder why nobody talks about OOPascal anymore"
For the same reason that no-one was talking about Object Oberon or Oberon 2 before Java 1.0 downloads clogged the T1 lines.
I was amazed at the uptake back then. People were torturing themselves with C++ like crazy and bitching and moaning about it then all of a sudden...
multiple C++ experts
You clearly aren't one of the ten. Rather a lot of standardisation effort has gone into making the existing features more orthogonal. You'd be hard-pushed to find any non-orthogonality in the major features of the language now.
Oh, and in C++, if your class is an object type, you can prohibit copying with a single line in the declaration of your object base class. If it is a value type, deep copying is exactly what you want. This has been known for about half a century, ever since somebody managed to change the value of 1.0 in their (very early) Fortran program. C++ has no "mis-features" in this area. It merely gives you the tools to support more than one style of programming.
1.0 = x
"This has been known for about half a century, ever since somebody managed to change the value of 1.0 in their (very early) Fortran program. "
I had forgotten all about that feature from Fortran 101, good for a laugh. Thanks.
Caution - old git moan
Once upon a time C/C++ where the primary language of choice for software development. A C/C++ programmer was an 'average' software developer because that's almost the only langauge that was used. Now Google are saying that they're effectively superior to 'average programmer'!
Sorry about the gap, was just enjoying a short spell of smugness.
@sT0rNG b4R3 duRiD. Concurrency in C is just fine, it's no better or worse than any other language that lets you have threads accessing global or shared memory.
I don't know about yourself, but I prefer to use pipes to move data between threads. That eliminates the hard part - concurrent memory access. It involves underlying memcpy()s (for that's what a pipe is in effect doing), which runs contrary to received wisdom on how to achieve high performance.
But if you consider the underlying architecture of modern processors, and the underlying activities of languages that endeavour to make it easier to have concurency, pipes don't really rob that much performance. Indeed by actually copying the data you can eliminate a lot of QPI / Hypertransport traffic especially if your NUMA PC (for that's what they are these days) is not running with interleaved memory.
It scales well too. All your threads become loops with a select() (or whatever the windows equivalent is) at the top followed by sections of code that do different jobs depending what's turned up in the input pipes. However, when your app gets too big for the machine, it's easy to turn pipes into sockets, threads in to processes, and run them on separate machines. Congratulations, you now have a distributed app! And you've not really changed any of the fundamentals of your source code. I normally end up writing a library that abstracts both pipes and sockets in to 'channels'.
Libraries like OpenMPI do a pretty good job of wrapping that sort of thing up in to a quite sophisticated API that allows you to write quite impressive distributed apps. It's what the supercomputer people use, and they know all about that sort of problem with their 10,000+ CPU machines. It's pretty heavy weight.
If you're really interested, take a look at
and discover just how old some of these ideas are and realise that there's nothing fundamentally new about languages like node.js, SCALA, etc. The proponents of these languages who like to proclaim their inventions haven't really done their research properly. CSP was in effect the founding rationale behind the Transputer and Occam. And none of these langauges do the really hard part for you anyway; working out how a task can be broken down in to separate threads in the first place. That does need the mind of a superior being.
You smugness will cause your downfall, little one.
"there's nothing fundamentally new about languages like node.js, SCALA, etc. The proponents of these languages who like to proclaim their inventions haven't really done their research properly"
These people are done their research quit well, thank you. They are even saying so explicitly:
"Scala rests on a strong theoretical foundation, as well as on practical experience. You can find below a collection of papers, theses, presentations, and other research resources related to the Scala language and to its development."
And then: http://www.scala-lang.org/node/143#papers
"and discover just how old some of these ideas are and realise that there's nothing fundamentally new about languages like node.js, SCALA, etc. "
It's a point *well* worth reminding people about.
This process of threads -> processes and pipes -> sockets sounds almost like a candidate for pairs of macro definitions with a flag to shift (SOLO?) to determine which set of definitions gets used.
I think the OP's smugness was directed against the proponents (fanbois, in register-speak) rather than the creators of these languages.
If that is the case, then I think his point stands. There has been very little fundamentally new in programming language design for several decades.
Thank you Ken; one's smugness was indeed primarily derived from Google implying that C/C++ programmers were superior beings...
My beef with proponents of languages like SCALA and node.js is that yes, whilst they are well developed (or on the way to being so) and offer the 'average programmer' a simpler means of writing more advanced applications, they do not deliver the highest possible performance. This is what Google has highlighted. Yet there is a need for more efficiency in data centres, large websites, etc. Lowering power consumption and improving speed are increasingly important commercial factors.
But it that's the case, why not aim for the very lowest power consumption and the very highest speed? Why not encourage programmers to up their game and actually get to grips with what's actually going on in their CPUs? Why not encourage universities to train software engineering students in the dark arts of low level programming for optimum computer performance? C++, and especially C, forces you to confront that reality and it is unpleasant, hard and nasty. But to scale as well as is humanly possible, you have know exactly what it is you're asking a CPU+MMU to do.
From what I read the big successful web services like Google and Amazon are heavily reliant on C/C++. We do hear of Facebook, Twitter, etc. all running into scaling problems; Facebook decided to compile php (yeeuuurk!) and Twitter adopted SCALA (a half way house in my opinion). The sooner services like them adopt metrics like 'Tweets per Watt' (or whatever) the sooner they'll work out that a few well paid C++ programmers can save a very large amount off the electricity bill. Maybe they already have. For the largest outfits, 10% power saving represents $millions in bills every single year; that'd pay for quite a few C/C++ developers.
@ Destroy all monsters; Less of the little one, more of the old one
My whole point is that there's nothing really new to SCALA's concurrency models. Both the Actor and CSP concurrency models date back to the 1970's. Pretty much all that fundamentally needs to be said about them was written back then. Modern interpretations have updated them for today's world (programmers have got used to objects), but the fundamentals are still as was.
[As an aside I contend that a Communicating Sequential Process is as much an 'object' as any Java class. It is encapsulated in that it's data is (or at least should be) private. It has public interfaces, it's just that the interface is a messaging specification rather than callable methods. And so on].
No one in their right mind would choose to develop a programme as a set of concurrent processes or threads. It's hard, no matter what language assistance you get. The only reason to do so is if you need the performance.
CSP encouraged the development of the Transputer and Occam. They were both briefly fashionable late 80's to very early 90's when the semiconductor industry had hit a MHz dead end. A miracle really, their dev tools were diabolically bad even by the standards of the day. There was a lot of muttering about parallel processing being the way of the future, and more than a few programmer's brows were mightly furrowed.
The Intel did the 66MHz 486, and whooosh, multi GHz arrived in due course. Everyone could forget about parallel processing and stay sane with single threaded programmes. Hooray!
But then the GHz ran out, and the core count started going up instead. Totally unsurprisingly all the old ideas crawl out of the wood work and get lightly modernised. The likes of Bernard Sufrin et al do deserve credit for bring these old ideas back to life, but I think there is a problem.
Remember, you only programme concurrent software if you have a pressing performance problem that a single core of 3GHz-ish can't satisfy. But if that's the case, does a language like SCALA (that still interposes some inevitable inefficiencies) really deliver you enough performance? If a concurrent software solution is being contemplated perhaps you're in a situation where ulimate performance might actually be highly desirable (like avoiding building a whole new power station). Wouldn't the academic effort be more effectively spent in developing better ways to teach programmers the dark arts of low level optimisation?
@bazza: I see what you mean...
Apologies for the earlier flaming. Been twitchy for the last few months. Information overload probably.
>> My whole point is that there's nothing really new to SCALA's concurrency models. Both the Actor and CSP concurrency models date back to the 1970's.
Well ... yes. Although Milner's "Communicating Mobile Processes" added something. No, I haven't managed to fully get through his book yet.
>> CSP encouraged the development of the Transputer and Occam. They were both briefly fashionable late 80's to very early 90's when the semiconductor industry had hit a MHz dead end.
Sure did. I had two of those PC-ISA transputer evaluation boards. The T400 CPU [2 links only] is still in my "collection", not yet encased in lucite.
>> Remember, you only programme concurrent software if you have a pressing performance problem that a single core of 3GHz-ish can't satisfy. But if that's the case, does a language like SCALA (that still interposes some inevitable inefficiencies) really deliver you enough performance?
Mnnno... The trend toward less powerful ("green/power-saving") cores in multicore packages as well the demand for less-specialized applications for which multiple processes make sense (servers that need more than a single event-handling loop for example) pushes in the direction of giving developers tools that enable them to actually exploit all this hardware, with abstractions that are better than the ones standard Java itself provides.
Nothing that could not be had in earlier approaches to be sure (Occam. Limbo. Linda for IPC. Or you could whip our the MPI library), but now the demand for easy multi-processing can be satisfied with something that is in the general orbit of the Java Mass [i.e. runs where the JVM runs, can use the Java libraries, can integrate with existing code, can be sold internally, can be used with a known IDE, has a somewhat familiar syntax] so it's arousing interest.
Thus Scala. A bit further, with less-familiar syntax, Clojure with its "transactional memory". And even further, with less-familiar syntax and on a non-Java VM, Erlang.
>>Wouldn't the academic effort be more effectively spent in developing better ways to teach programmers the dark arts of low level optimisation?
When you write Scala code, it will run on a VM, yes. But then again, the VM will compile it down at runtime, and if you need to, and you can optimize that. If the language-level abstraction is well chosen, that should give you all the optimization you need.
Java developer in a multi-core era:
Communicating Sequential Processes for Java:
Clojure and concurrent programming:
Communicating Mobile Processes. Introducing occam-pi:
Every mobile phone (not talking apps - talking the code that make the phone work), uses multiple cores and multiple processors to get the required performance at low power (and hence GHz) rates. There are quite a few very good coders out there working in that area (Usually C rather than C++, dropping down in to assembler where necessary). It's not just mobiles of course, many embedded devices are just the same. Being competent in concurrency is more common than a lot of people think.....
C++ > The Rest
Hm, nearly irrelevant. The very few places where code must run fast must be coded to be fast, optimization done by way of intelligent design. All the rest of the code doesn't really matter.
Re: nearly irrelevant
Agreed, as long as you accept that *most* code needs to run faster than it does at the moment,
I'm fed up with people telling me that code speed no longer matters, when I can *still* out-type programs on a machine that is several orders of magnitude faster than the one I was using 20 years ago.
Google's paper is quite interesting. The raw results are that C++ is about 2.5 times faster than the best of the rest and the worst is about the same factor further behind. That's quite a big hit for the worst case (Go).
When the sample programs were offered to Google employees to tune, a roughly similar improvement (3x) was seen in every case. For C++, there were easy pickings by replacing O(n) methods with O(1) methods in the standard library, and changing data structures to improve locality. I'd call these "low-hanging fruit" rather than "sophisticated". For Java and Scala, one could tune the garbage collection. For Scala, one could adopt a more functional programming style. I don't know how clever those changes were, because I don't use those languages, but let's assume they are *not* (in Google's words) "complicated and hard to tune".
The point is that we're talking about a factor of "several" performance improvement that is available with code reviews or a change of language, and probably an order of magnitude that is available if you do both. It would take Intel or AMD *years* to deliver the same performance improvement, and then you'd have to pay for the new hardware, so it is clearly worthwhile, but it doesn't happen for some reason.
Maybe the average programmer is just crap?
Maybe the average programmer is just crap?
could be truer than you think... for a given type if average, half of them are going to be even worse than that...
So are you suggesting that God, not humans, created software optimization?
Fast code is very important
Whether that be in :
Embedded devices - where fast code can achieve more with a small micro reducing product costt.
Or mobile devices (phones & laptops)- where fast code uses less clock cycles and therefore the battery lasts longer or you can use a smaller and cheaper battery.
Or data centres where fast code means more work can be done with less CPUs and less power consumption, therefore reducing costs and power consumption.
Always remembering *the* golden rule
*Premature* optimisation is the rule of most evil.
And the silver bye-law
Refusal to optimise is the root of Windows Vista, and changing your mind after shipping is the root of Windows 7.
but don't leave it too late
Many of the performance issues are an inherent result of the architecture. If you leave things too late you probably can't change and optimise easily.
"If you leave things too late you probably can't change and optimise easily."
My old copy of Code Complete pointed out for best results you need to start with the actual *algorithm* you're going to use first.
Poor choice (bubble sort anyone?) here will stuff *any* amount of code tuning.
However a well *partitioned* architecture will let you swap out the poorly performing modules and replace them with something better. Doing this partitioning well seems to be quite tricky.
Very hard to do this right
I looked into something similar myself a number of years ago. It is very very hard to do this in an unbiast way.
My results were that the actual compilers (I'm including the JVM JIT in this) tend to be equally good at optimising code. The difference between different languages is that the different language features tend to cause the programmer to make different design choices (some choices are just not available in some languages) these effect performance.
If you can identify the performance critical sections of the code and encapsulate them (and their data) then it is possible to re-write them for performance however these sections tend to look very different to "normal" code in that language (e.g. Java code that uses arrays and looks more like C than java).
I'd be willing to bet that the winning C++ verision was making heavy use of templates (template metaprogramming) this is really a language all of its own and can give very good performance but (IMHO) damages the code in terms of maintainability and intelligability.
"I'd be willing to bet that the winning C++ verision was making heavy use of templates (template metaprogramming)"
The article linked to the paper which in turn gave a URL for the code. I suggest you check it out before accepting any bets.
The actual coding exercise under study was a graph algorithm or two. The C++ code used the collection classes in the standard library. These /are/ templates, but require no template meta-programming to use them.
The C++ code did not use a dedicated graph library, despite the fact that boost have one and it is almost certainly bug-free and tuned to within a gnat's arse of divinity. But even that wouldn't have required any meta-programming for this exercise.
Since templates are endemic in the standard library, it is almost impossible to write idiomatic C++ without using templates. OTOH, it is quite easy to write rather a lot of code without much meta-programming on your part. If you are simply averse to C++ syntax, then by all means avoid it, but don't assume that everyone else feels the same way.
pascal is not the answer
If C/C++ is uber now.. my assembler skills must be godlike! (smug mode on).
To the poster above.. pascal is a complete nonstarter. It's like programming java with one hand tied behind your back.
Object pascal? Maybe the poster only used it in a school environment? I help maintain a 1.5 million line project. Suffice to say, its single pass nature cripples it fatally. (I had a whole rant here but deleted it..).
Most of the 'good' ideas from delphi ended up in c# - you'd be surprised how similar they are (right down to identical function names).
90% of the time the language you used is determined by the task. I write Android in Java, iOS in ObjC, my main job in Delphi, maintain others code in C or C++.. One isn't 'better' than another.. Every language has its 'WTF?' moments. What matters is you get the job done, and you don't write an unmaintainable behemoth that will drive the guy you comes after you quetly insane.
It's like programming java with one hand tied behind your back.
No. It's like programming java with the other hand tied behind your back AS WELL.
The only "high" level language that regularly beats C++ is FORTRAN. 50 years old and still the champ. Of course the FORTRAN code must be recompiled for each machine architecture and the FORTRAN compiler is written in C++.
Re: FORTAN (sic)
I guess you really mean Fortran 90 or a descendent of it, in which case it's really a C like language with a vague resemblance to FORTRAN 77 to keep the old farts happy. The FORTRAN of fifty years ago, or even of 20 years ago, hasn't been in widespread use since the early 1990s.
You might be right
I used to do most of my computational stuff in a macro assembler (under Primos). When I disassembled compiled code, Fortran77 compilers were almost as good as assembler. Other compilers produced utter tripe (especially C). The Fortran model is good, although I really didn't like using it. F77 was far enough: all the guys with problems could still present me with non-indented (etc) POS's and ask whats wrong with them.
A previous post mentioned orders of magnitude. This is my recollection. I haven't got the hang of assembly programming in unix. It just seems like too much work.
Down with complexity, I say.
@Pidgeon re: Assembler.
Writing an assembler program in Unix is not difficult at all.
In fact it's probably heck of a lot easier that say, for a ZX81 or whatever. For one, there's tons of libraries. Once you got the ABI figured out, you're set. This is not hard (Learning the libraries are). Of course, you're no longer on bare metal, you're in userland on top of the OS which both makes things easier and more difficult.
But ultimately, you will stop and think... damn, I'd get pretty much the same and more done much quicker in C.
Also, I would caution too liberal use of assembler, for the very reason that by and large today compilers (and I can only speak mainly of C compilers) generally do a pretty good job. Chips today, even with the same instruction set, are so heterogenous, think out-of-order intel 'cores' and in-order atoms...These kind of issues.
Don't get me wrong. I grew up having to learn assembly (in fact, before I learnt C). I've just grown to respect the fact that chips now are so complicated, and they keep changing so quickly. C compilers today are also MUCH better than they were before. I'm even talking about gcc, not just intel's.
I'll be honest, in the past few years, there's not been many an occasion that I've been able to beat tuned c code out of a c compiler with assembly with any degree of signficance. I can't think of any instance off the top of my head, apart from correcting occasional silly redundant things a compiler does, but that's really improving on the code put out by the compiler.
I still believe however, a programmer should start learning his craft from bottom up.
- Nokia: Read our Maps, Samsung – we're HERE for the Gear
- Ofcom will not probe lesbian lizard snog in new Dr Who series
- Kaspersky backpedals on 'done nothing wrong, nothing to fear' blather
- Episode 9 BOFH: The current value of our IT ASSets? Minus eleventy-seven...
- Too slow with that iPhone refresh, Apple: Android is GOBBLING up US mobile market