Real world performance...
This is not the first time that a new discrete Fourier transform has come along.
About ten years ago some academic came up with a fresh approach to the algorithm that claimed to produce the correct arithmetic result but with a reduction in the number of floating point operations. Everyone got very excited, but as far as I know it never saw the light of day in any commercial application. Whilst the exact algorithm wasn't published (they were aiming for a patent) from the vague description the authors gave I concluded that it wasn't going to be very cache friendly. And if an algorithm isn't cache friendly then it isn't going to be terribly fast on a CPU, especially if the amount of data being processed is larger than the L1 cache size; you'd have to build dedicated silicon to implement it, and that is *very* expensive to do.
The bigger CPUs (x86, sparc, ppc; not ARM) generally are astonishingly fast if data fits in L1 cache (ever timed the Intel IPP library's FFT on smallish data sizes?) and a complicated algorithm like this new one may actually be of some real world benefit. If it can break down larger FFTs into lumps of data that stay in L1 cache for longer then the real world performance could be significantly better than existing algorithms. So there maybe some software applications.
But as for hardware applications (signal/video/image processing in mobile phones) it might not see the light of day for a long time; if it can be squeezed into existing chips then great; if not, then it'll have to wait until the next design iterations, which could be a long time away. And it will have to worth it; if it takes twice as many transistors then the cost/benefit analysis that the hardware manufacturers will domight not stack up in its favour.