It'd be nice if everything was so convenient
This sort of coding (concern for data locality in massively-parallel architectures) has been quite normal in supercomputing for decades. And in the GPGPU programming segment for years. And it still is darn hard. The basic problem is that not all applications nicely slot into something where data will conveniently be located locally to where it is needed. Hence Berkeley can come up with a fastest-ever parallel algorithm for matrix multiplication - because they chose a problem (matrix mult) that *can* take full advantage of the way modern multi-core systems work. And it's *still* hard to code! Try doing the same with parallel algorithms that don't conveniently have data locality and you find the system runs hardly any faster than on a normal CPU because memory bottlenecks dominate. Sure, *sometimes* you can get it faster with tricks like "compute the same variable per core", but funnily enough Intel won't be telling us of all the failures too (which I suspect vastly outweigh the successes - nobody can publish failures because it might just be their lack of imagination in solving the problem and so gets rejected by reviewers).
Sorry, adding more cores onto a chip isn't some kind of revolution - it's an admission of defeat. Trying to tell us that it's our fault 'cause we need to get smarter in coding is raw marketing spin.