Steve Scott, the former CTO at supercomputer maker Cray, joined Nvidia last summer as CTO for the chip maker's Tesla GPU coprocessor division, and the idea was to shake things up a bit and not only sell more Tesla units, but to shape expectations in supercomputing as we strive to reach exascale capacities. And so, in his first …
A long time ago I spotted the opportunity to replace four complex multiplications (24 floating ops) by three integer adds, one table lookup, and one complex multiply.
On a VAX CPU of the early 1980s that was a big win.
On today's, it's probably a big lose, because DRAM access is so slow compared to registers. Although, if the entire lookup table would fit into the CPU cache and be accessed many times from cache, maybe not.
And as for implementing it in a GPU ... I don't do this sort of coding any more. One thing for sure, a compiler isn't going to help. You may well have to go all the way back to the maths and choose a different algorithm.
Re: An example
Table lookups on GPUs ... are supposedly highly effective by design - after all, texture mapping is nothing else but exactly that. The above isn't unlikely to still be fastest on a GPU (though there, it probably makes little difference whether you do integer or FP adds; depends whether interpolation is useful/desired or not).
- Pics Whisper tracks its users. So we tracked down its LA office. This is what happened next
- YARR! Pirates walk the plank: DMCA magnets sink in Google results
- Review Xperia Z3: Crikey, Sony – ANOTHER flagship phondleslab?
- Ex-US Navy fighter pilot MIT prof: Drones beat humans - I should know
- Human spacecraft dodge COMET CHUNKS pelting off Mars