The concept of a global clock with double buffering just doesn't cut it. Global clocks are slow. Why should one part run slow if I can run other parts faster? Then you've got register/cache/memory speed issues. If we adopt your solution we end up running at the speed of the slowest *possible* bottleneck instead of the slowest bottleneck.
On the hardware front I reckon we'll end up with a bunch of non-homogoneous cores with homogeneous instruction sets running on a fast IO interconnect.
On the software front we'll end up with some form of multi-threading/multi-process using either NUMA shared memory or Message Parsing. Developers will just have to get used to the fact that programming is hard and and that the things you learnt in your Computer Science degree are actually useful.
BTW the sure sign of a kook is when they say algorithms are dead then present another algorithm.