Steve Scott, the former CTO at supercomputer maker Cray, joined Nvidia last summer as CTO for the chip maker's Tesla GPU coprocessor division, and the idea was to shake things up a bit and not only sell more Tesla units, but to shape expectations in supercomputing as we strive to reach exascale capacities. And so, in his first …
A long time ago I spotted the opportunity to replace four complex multiplications (24 floating ops) by three integer adds, one table lookup, and one complex multiply.
On a VAX CPU of the early 1980s that was a big win.
On today's, it's probably a big lose, because DRAM access is so slow compared to registers. Although, if the entire lookup table would fit into the CPU cache and be accessed many times from cache, maybe not.
And as for implementing it in a GPU ... I don't do this sort of coding any more. One thing for sure, a compiler isn't going to help. You may well have to go all the way back to the maths and choose a different algorithm.
Re: An example
Table lookups on GPUs ... are supposedly highly effective by design - after all, texture mapping is nothing else but exactly that. The above isn't unlikely to still be fastest on a GPU (though there, it probably makes little difference whether you do integer or FP adds; depends whether interpolation is useful/desired or not).
- The land of Milk and Sammy: Free music app touted by Samsung
- The long war on 'DRAM price fixing' is over: Claim YOUR spoils now (It's worth a few beers)
- Privacy warriors lob sueball at Facebook buyout of WhatsApp
- 20 Freescale staff on vanished Malaysia Airlines flight MH370
- Dell thuds down low-cost lap workstation for
cheapfrugal creatives or engineers