Re: ARM vs. x86
"Why are there still no RISC systems that thoroughly outperform x86 on a core for core basis?"
There are a number of reasons. The first one is that Intel has a huge investment in fabrication plants ("fabs" in industry parlance), and usually manages to be a generation ahead of the ARM builders in implementation technology - lithography, insulation, and/or feature size. They just have a better _chip_, regardless of the architecture. Samsung and TMSC are catching up, but for a long time Intel has a huge advantage simply in construction - and this gave them faster clock rates, lower voltages, etc.
The second reason is that convoluted x86 architecture makes no claims to be elegant or simple. But it IS well known what the asymmetries are, and Intel's compilers are heavily optimised to work with them where possible. So having finely tuned compilers has greatly helped even-out the architectural differences.
Thirdly, Intel simply invests a whole lot more into developing their x86 follow-ons than ARM has. Such is the result of market dominance. Intel really does have a number of very smart designers, and as even you stated, it may be a turd, but it is a WELL-POLISHED turd. Things like branch-prediction, out-of-order execution, superpipelining...Intel has these down to a very fine art, with a degree of circuitry dedicated to this that ARM historically lacks.
Lastly, for ARM to focus on execution speed, it would need to add lots of transistors on-die to do these things (Out-of-order, prediction, superpiplining) to the degree that Intel does - and THAT would negate a good portion of ARM's power consumption advantage. There is no free lunch - having a better instruction set doesn't make up for lacking the degree of dedicated silicon that Intel uses to speed up execution. The day that ARM begins to catch Intel in single-core execution speed will also be the day that ARM begins to catch Intel in power usage...