Pentium 4 didn't suck.
It is merely the case of developers (including most compiler developers) being too incompetent to leverage it's capabilities efficiently.
See here for relevant performance comparison data with well written C code (no assembly) of P3 vs. P4 using different compilers:
http://www.altechnative.net/2010/12/31/choice-of-compilers-part-1-x86/
Note that with crap compilers the P4 did indeed perform relatively poorly. OTOH, with a decent compiler (which annihilated a crap yet ubiqutous compiler on any CPU), P4 shows a very significant per clock throughput increase over the P3.
The point being that software is written for hardware, not vice versa. Don't blame the hardware manufacturer if you are too incompetent to use the equipment it to it's full capability.