Re: "I did not know that ARM actually prohibited adding instructions"
"No, that is not what CPUID does. CPUID has no effect on the fetch-decode-dispatch-execute queue."
Again - I agree, its not what I am saying. I'm basing my statements on how operating systems present CPU functionality to applications - for this discussion, we don't have to consider exactly how this will be executed on the underlying CPU as the OS or compiler will take care of this for us.
"If you want to use the information returned by CPUID to affect program execution, you need to - at a minimum: parse the information provided by CPUID and then jump to a different place in the opcode stream to continue execution."
This is exactly what I am saying - your OS (typically) or application determine which path is taken at runtime based on CPUID checks. From a binary perspective, multiple code paths exist and are selected based on CPU support. You use more memory/storage for a larger binary image with this approach, but that is generally trivial compared to the speed increase you get from an optimised code path.
Regarding the cost of the CPUID, it is likely a one off evaluation to set variables. That variable is then used for your subsequent calls (i.e. using SIMD vs general purpose maths for vectors) but the cost is hidden in cache and branch prediction. There is likely to be no real performance cost for a modern CPU.
In most OS's, all of this is hidden by the OS and ignored in higher layers unless you require specific functionality not supported by the OS. From a programming perspective, this optimisation is likely handled by your compiler.