"Are these boffins suggesting that nothing, anywhere, ever - will be able to run Crysis at a decent frame rate?"
I know it was a joke, but ... I suspect that these boffins would spoil their pants if one of their workloads was as embarrasingly parallel as Crysis. The issues are
1) ...whether your problem has a lot of computation compared to sharing of data
2) ...whether that ratio scales nicely as you chunk the problem into smaller pieces.
3) ...whether your chosen implementation language(s) let you express that
For most applications, the answer to (1) is yes for "parts of the problem". I have no trouble identify small loops and other parts of my code that are embarrasingly parallel. Neither has my compiler, come to that.
Sadly, this isn't useful, because of (2). These fragments *are* small and by the time I've glued together several hundred embarrasingly parallel fragments, I discover that the overheads of the glue weigh more than the original single-threaded solution. Worse, there are quite a few important problems that don't even have embarrassingly parallel fragments.
Even assuming the parallelism exists, the answer to (3) is still no if you are programming in any language that you've ever heard of. OK, some kind soul mentioned Erlang earlier and it is interesting, but if it really solved the problem then Intel and AMD would have beaten a path to their door and hammered it down by now. As I mentioned earlier, Intel have built an 80-way processor. Sun have built and sold a 64-way processor. The hardware isn't the problem.
On a small scale, almost any compiler can take your function, spread out all the data dependencies and find the optimal solution. Good compilers can do this for groups of functions that call one another in some sort of closed system. On an out-of-order processor, that actually delivers useful parallelism even today.
On a larger scale, no-one has found a concise way of scaling that. That is, there's no compiler where you feed it a number N and it generates code that is efficiently on an N-way system for arbitrary values of N.
Even with such a language *someone* would still have to rewrite everything from your OS upwards and those in the closed source universe would still have to pay for an upgrade to everything they own, before end-users would see a benefit.
But I remain an optimist, because modestly multi-core processors make it affordable for lots of people to experiment with modestly parallel rewrites of their software. (The cost of the glue drops in relation to the benefit.) It also makes languages like Erlang more affordable and more attractive. For 50 years, there has been a serious penalty on anyone who didn't manually serialise their algorithm. At last, this is beginning to change.