x86 hoovers ...
... them all up. If that's what you mean by "sucks", I fully agree.
Instruction set these days are much less of a distinguisher than they used to be. You say "transistor count" - well, add two million or so additional transistors to the next-gen x86 for improvements on the CISC decoder stages, i.e. roughly quadruple them - that's far less than 0.1% of the total additional transistor budget for a new-gen CPU (remind me, how many billions of transistors does a 8-core chip have ?). In the big picture of things, having 1 million or 10 million transistors doing instruction predecoding, so what. There's orders of magnitude more in caches, buffers, interlinks and other glue these days.
If instruction efficiency were really key to the success of anything, then we'd never have seen Java get to where it is - the bytecode being a stack machine is just about as hardware-implementation-scalability unfriendly as could be. Who (apart from JVM/JIT implementors that have both my pity & admiration) cares ?
Also, wrt. to what current x86/x64 actually are:
First, both Intel as well as AMD have used RISC engines in their cores for many years (variously called my-ops, micro-ops, R-Ops or some such), including superscalar/VLIW-style instruction bundling. The x86 instruction set compatibility is only a shim layer.
Second, current x86 optimization guides by both Intel and AMD actually state (if phrased less in-your-face): You want this thing fast, use simple instructions, order them for no/little interdependency - in short, use RISC, the x86/x64 bits mapping 1:1 to what the low-level engine does are by far the best for you.
Third, agreed that x86/x64 may have a few thousand instructions (the instruction set reference manuals have 1600+ pages these days). But have you ever made the experiment how many of them are indeed used in executable binary code on your systems ? When teaching low-level debugging and x86 assembly language, one of my favourite experiments was to let students write a little perl script that found all ELF i386/x64-64 executables, disassembled the code, sorted by instruction and then created an instruction set histogram. Invariably, one found that 99.9% were made up from a set of no more than about 50 opcodes. Programming a CPU using only ~50 instructions, where have I heard that before ? Ah - yes, I think it's called "RISC".
That's not to say splashing resources and giant transistor counts all over it is the only way to get decently-performing CPUs. It's just the one that, by experiment, in the last 20 years has proven to create the best-performing server / workstation class CPUs. But you only need to look beyond that space, what do you see ? Set-top boxes, mobile phones, tablet devices, consoles, the entire "embedded" space uses MIPS, ARM, PPC and other, elsewhere forgotten/abandoned architectures. And Intel's attempt to push x86 into that space isn't anywhere near as simple as Intel dreamt it to be.
To sum this up:
x86/x64, in the high-end, "stationary" systems space, has proven to be just about efficient and performant enough to beat everything else. There, it sucks ... up all the competitors.
x86/x64, in the mobile/embedded space, never made inroads. There SoC solutions, tiny power/thermal/form factors are mandatory, and implementors want to license modular designs to combine cores, graphics, comms etc. into a single package. Off-the-shelf components there are frowned upon. Here, x86 sucks.
Gosh, you're right after all. No matter how you look at x86/x64, it always sucks !