Its easy enough to make portable code for a single execution processor -- like you say, you just flip around the makefile macros and you can compile for anything you've got a code generator for. I think the problems appear when you try to exploit features of the overall processor subsystem (multiple cores, cache manipulation and so on). You can end up with exquisitely optimized code for a specific architecture. This may be what Linus is talking about.
BTW -- I was under the impression that x86 processors were microcoded (from time to time the test/load instruction becomes exposed which is a very Big Deal -- its like the greatest trade secret inside Intel). Given this, I wonder what the actual processor architecture looks like?