The Register uses cookies. Find out more.

back to article Nvidia: No magic compilers for HPC coprocessors

Steve Scott, the former CTO at supercomputer maker Cray, joined Nvidia last summer as CTO for the chip maker's Tesla GPU coprocessor division, and the idea was to shake things up a bit and not only sell more Tesla units, but to shape expectations in supercomputing as we strive to reach exascale capacities. And so, in his first …


This topic is closed for new posts.
Silver badge

An example

A long time ago I spotted the opportunity to replace four complex multiplications (24 floating ops) by three integer adds, one table lookup, and one complex multiply.

On a VAX CPU of the early 1980s that was a big win.

On today's, it's probably a big lose, because DRAM access is so slow compared to registers. Although, if the entire lookup table would fit into the CPU cache and be accessed many times from cache, maybe not.

And as for implementing it in a GPU ... I don't do this sort of coding any more. One thing for sure, a compiler isn't going to help. You may well have to go all the way back to the maths and choose a different algorithm.


Re: An example

Table lookups on GPUs ... are supposedly highly effective by design - after all, texture mapping is nothing else but exactly that. The above isn't unlikely to still be fastest on a GPU (though there, it probably makes little difference whether you do integer or FP adds; depends whether interpolation is useful/desired or not).

This topic is closed for new posts.