Steve Scott, the former CTO at supercomputer maker Cray, joined Nvidia last summer as CTO for the chip maker's Tesla GPU coprocessor division, and the idea was to shake things up a bit and not only sell more Tesla units, but to shape expectations in supercomputing as we strive to reach exascale capacities. And so, in his first …
A long time ago I spotted the opportunity to replace four complex multiplications (24 floating ops) by three integer adds, one table lookup, and one complex multiply.
On a VAX CPU of the early 1980s that was a big win.
On today's, it's probably a big lose, because DRAM access is so slow compared to registers. Although, if the entire lookup table would fit into the CPU cache and be accessed many times from cache, maybe not.
And as for implementing it in a GPU ... I don't do this sort of coding any more. One thing for sure, a compiler isn't going to help. You may well have to go all the way back to the maths and choose a different algorithm.
Re: An example
Table lookups on GPUs ... are supposedly highly effective by design - after all, texture mapping is nothing else but exactly that. The above isn't unlikely to still be fastest on a GPU (though there, it probably makes little difference whether you do integer or FP adds; depends whether interpolation is useful/desired or not).