Graphics processor and SoC chip maker Nvidia is hosting its GTC Asia conference in Beijing this week, and with the next-generation Kepler GPUs being pushed out to early next year, there isn't any new chippery to salivate over. But Nvidia has some new compilers and a revved up CUDA development kit to make things interesting just …
"Nobody wants to read the manual," says Gupta with a laugh. And so this expert system has a redesigned visual code profiler that shows bottlenecks in the code, offers hints on how to fix them, and automagically finds the right portions of the CUDA manual to fix the problem.
Gupta's taking the piss ...
... out of kiddies/marketers/journalists that are clueless. Or maybe Sumit is clueless himself, and truly believes it ...
The GNU tool-chain has been working exactly this way for decades. Anyone working close to the hardware already hand-massages the assembler that the C-family compilers kick out before linking the code that becomes the executable/binary ...
 And more ... COBOL, Fortran, ADA, yadda yadda yadda.
It would not hurt to go and read up on the differences between the GNU toolchain and LLVM-based ones.
No, Gupta gets it. You don't.
The GNU compiler collection is a monolithic POS. The new LLVM paradigm is different, maybe not from a strict computer science point of view, but from a practical point of view it's like night and day.
Clang is a nicely-written C/C++/ObjC compiler, written in a modern language, using modern techniques. There are detailed instructions for adding your own keywords. If you've tried to modify gcc you'll know that gcc is ... not like that.
LLVM is a very cool piece of middleware that takes a universal IL and either interprets, JITs or compiles it, onto a wide variety of platforms. There are directions for retargetting it. Once again, doing the same kind of job in gcc is a lot harder. Even though in principle gcc has the same kind of flexible architecture, in practice it's highly monolithic.
Also, Clang outperforms gcc by 3x (compile-time), and LLVM outperforms gcc by 10-20% (runtime).
The GNU toolchain is on its way out, and for good reason. All hail nVidia for speeding up the process.
(PS: Hand-massaging assembler is neither cost-effective nor maintainable. I can't remember the last time I saw someone tweak the assembler output of a tool - maybe 1995? There is some assembler coding going on still, but pros are mostly optimizing things at the memory heirarchy level because the compilers have been good enough at the instruction level since gcc 4, and saving a cycle to lose it again on a stalled cache read is a non-win. I haven't written or modified any assembler for years, despite working directly in the low-level code optimization space.)
I write commercial software where performance is important (though not life or death critical). Frankly, these days, I'm far more concerned about the exponent to O [given a typical unbounded nO^e asymptotic performance equation] rather than the value of n, because n doesn't usually have a truly significant variance between languages. When I'm doing things a million times it's a lot more important as to whether the equation is O(f(n)) or O(f(n^2)); in the latter case I do everything one million times. It's a bit like arguing over the register parameters vs. stack parameters calling convention - on a modern machine, including embedded systems, it makes neither any subjective difference nor minimal objective difference as to which of the conventions is used. However, choosing the right of library, understanding it, and apply that knowledge efficiently are critical IMHO.
IMO, the potential advantages of the LLVM toolchain are significant regardless of the effectively immaterial performance difference. LLVM is taking an extremely-good-but-dated toolchain and applying *nix principles to it by decomposing it into a suite of independent applications that "do one thing, and do it well". The claim that it's more efficient may be dubious (probably - I don't know; there'll be some dodgy benchmark somewhere to prove it though),
If any commercial/professional FLOSS developers disagree with my viewpoints, I'd welcome a constructive debate!
Paris, because I'm so stoned I've got no idea what I'm doing or saying.
>>Also, Clang outperforms gcc by 3x (compile-time), and LLVM outperforms gcc by 10-20% (runtime).
Did you do the bench-marking yourself? Or you are taking the clang's developers pov? As far as I know (from phronix, e.g.) the compilation was not as much faster, as you and the clang people are trying to say. Just several percent ahead sometimes. I also heard that memory footprint of gcc is worse than that of Clang.
However, the optimization of the two compilers are beyond comparison right now. Clang produces much slower binaries. I doubt that Gupta meant 10% of the compilation time performance, it was the binary's to be executed on their GPUs, right?
@ Eddie Edwards
I think your colophon neatly describes just exactly why the OSes from Redmond and Cupertino (and the various *buntus and other shovel-ware distros) and "add on software" are the bloated, buggy kitchen-sink-ware that they are.
In a nutshell: Kids no longer know how to code because CPU and memory are cheep.
Ta for the wrap-up :-)
I think it might be a step forward
but why not FLOSS the lot?
By not making stuff fully open you merely restrict the ability of your friends to help you by denying them the knowledge your competitors will have obtained the first time your device/software was made available.
Well, one possible reason is the licensing of 3rd party IP. If nVidia use someone else's tech in their toolchain, they might well not be allowed to open source it. Just a thought.
Maybe or maybe not. Companies like Apple and nNvidia are very fond of using someone else's work, but it takes a lot of courage to do something in return. It does not apply always to BSD and other permissive licenses, Apple uses GNU bash and gcc (until recently at least)...