how much faster it would be because you'd optimise code at compile time rather than runtime. That struck me as being a "better" way to do things
That was the theory.
In practice we tried out a system for an HPC application. On running the compiler with an analyser for ~10 hours overnight the result was <1% improvement from not using the analyser. With no guarantee that it wouldn't actually be worse for a different input. And it wasn't any faster than the 64bit MIPS systems we had.
And this application's code was updated about once a month
It just wasn't worth the effort.
Then AMD came along with amd64 and we all breathed a sigh of relief.