Apples and Oranges
Nvidia GPU (ATI too AFAICT) are very good for very parallel algorithms, as befits their SPMD nature. But a GPU cannot handle MPMD problems as well as a Cell. Cell is rather inherently an MPMD processor: the SPEs can each run different kernel and inter-SPE comms is fully supported.
Also, within the Cell you have more than adequate bandwidth between the PPU and the SPUs, while on a GPU you have the PCIe bus limiting bandwidth between the CPU and GPU, and while "bandwidth within the GPU" can apply it is very tricky.
You might profitably compare a PC with 8 GPU cores to a Cell with it's 8 SPUs. In that case one may find ways around the difference between Cell's internal bus bandwidth and PCIe bandwidth. But that's still apple and oranges becuase of the different nature of a GPU core and an SPE.
I've programmed both. If Cell machines were not so stupidly expensive (8800GT=$200CDN, PS3=$400CDN, real Cell computer=$10KCDN+), I cannot see how GPUs could compete. I do not include Tesla because it is really a GPU with a lot more RAM - same PCIe bus.
You have to consider the nature of the problem when comparing processors. Cells throw lower than GPUs but GPUs are rather restrictive of the algorithms they can be used with.
I really wish there were such a thing as a standard tower form factor PC built around a Cell (or a pair). Maybe ditching Rambus will help this happen? That all the Cell computers other than PS3 are of the "enterprise blade form factor or 1U rack form factor with attendant high price" is unfortunate, it keeps the cost up. Perhaps it is time for another attempt on the x86 market by PPC?
LOL. But I'll keep looking for it nevertheless.
How about a dual processor Cell PC with 4 x16 PCIe slots stuffed with 8800's or 3870's? Beginning to see the difference btwn the two kinds of processors yet? ;-)
And to be fair I cannot see why a really massive PCIe card could not be made with N>8 GPUs on it and a really decent inter-GPU bus. It would require it's own idiosyncratic programming API, but neither GPUs nor Cells are programmed like general purpose CPUs. Nvidia might hate it, but IBM's ALF and DaCS would probably apply and make life easier for the people needing to (learn how to) program such a card.
Well, that's my $0.02 anyway.