Misses the point
There are some significant points being missed in the analysis of supercomputer architectures. Rather than concentrate on the ubiquity of x86 processors, and the number of blade based systems, there are deeper questions.
For many purchasers of research supercomputers it is all about research for the dollar. This then becomes a self re-enforcing paradigm. If you are a researcher whose problem is a good fit onto a commodity cluster, you get great research productivity, publish more, generally do well, and of course you either directly or indirectly influence the purchase of the next round of hardware. If you are a researcher whose problem is not a happy fit onto a commodity cluster you do badly, get poor research productivity, and get squeezed out. Eventually you will move to a non-computational area of research. Thus for almost any university based supercomputer a cluster is a given.
However if you have a problem for which the money for a supercomputer is not the issue, rather the simple need to get the job done, you buy the right machine for the problem. Here you see the big research labs - those with specific mandates for results in a particular area. So, atomic physics, weather and climate, protein folding etc. Look at those. You don't see a dominance of commodity clusters. You see a range of architectures many, without x86 processors, and often sporting exotic memory and interconnect architectures. They are bought because they are the right machine.
Many of the machines in the top-500 will never be used as a single machine to solve a problem. Rather they are task farms or a used as a large number of small clusters. Serving a large number of concurrent users. The one and only time they are used for a single job is when benchmarked to get a listing in the Top-500. Further, the Top-500 is seriously biased in favour of clusters. It uses a single very simple benchmark, based on Linpack. This benchmark is a very poor indicator of performance on real world problems (except for Linpack). If you have enough memory on each node it is almost totally insensitive to the interconnect latency, which is ludicrous. Careful tuning of the parameters (which is allowed by the Top-500 rules) means you can tune the blocking factors to match the cache sizes, and pretty much tune out the memory bandwidth as a limiting factor. Further, it does not make any measure of IO speed, or problems that cannot be easily blocked for good cache performance. In short the benchmark is, by an unfortunate quirk of history, almost perfectly matched to show a simple cheap cluster in the best light.
So those architectures that pay great attention to these issues, real HPC issues, at significant cost, are heavily disadvantaged. Almost everyone in the HPC arena will acknowledge that the Top-500 listing is mostly bogus, with a very poor relationship to real performance on real problems, but everyone wants to see their new machine in there, so no-one has managed to overturn it. Despite some good work on better benchmarks. Years ago a colleague dubbed this behaviour as "Gigaflop harlotry." One of the most apt phrases I have ever heard applied.
(As a closer, one notes that the notion of virtualisation is cited as a significant player in these HPC systems. Which it most certainly is not. There is no role for virtualisation whatsoever. We do not want fractions of a processor shared across lots of problems, we want lots of full processors unified to solve one problem. In any HPC system, one of the first things one does with a hyperthread capable Xeon is turn hyper-threading off. Leaving it on sucks a small but noticeable amount of performance. Adding a hypervisor onto the base system would be insanity.)