Re: "I did not know that ARM actually prohibited adding instructions"
> The executing code looks to see if the instructions are supported and if so, it uses them, otherwise it will have to use a software emulation library [ ... ]
There is a cost associated with a conditional branch, plus a kernel trap plus a mode switch sequence - i.e. kernel-land <-> user-land <-> kernel-land.
Conditional branches themselves are very expensive. Usually between 12 and 16 cycles. Plus the side-effects of a conditional branch, namely flushing the dispatch queue. Add the cost of the trap plus the cost of two mode switches. How many clock cycles are we talking about here? Somewhere in the vicinity of ~90 cycles? Just to determine whether or not the next opcode is legal, and if it's not, punt it to the emulation layer, and then come back.
Assume an ADD %rs1, %rs2, %rd instruction with a cost of 2 cycles. You are adding an overhead of ~90 cycles just to test if a 2-cycle instruction is legal, or must be emulated in software.
There is no free lunch here. Any which way you want to implement your trap to the emulation library for the unknown opcodes, it comes with an unacceptably high cost.
Not to mention the side-effects on any possible optimizations that the compiler might try to do. That is, it becomes impossible to do them.