Re: "I did not know that ARM actually prohibited adding instructions"
> As you alter the ISA, you need to be able to ensure software works across different ISA's via CPUID flags.
No. That is not how it works at all.
Instructions, their registers, and any additional / ancillary flags are encoded into a 32-bit integer for ARM, ARM64, X86_64, SPARC, SPARC64, MIPS, PPC, PPC64, etc. For X86, instructions are at most 15-bits wide.
In other words, an instruction is a 32-bit (or up to 15-bit) integer that, when read and subsequently loaded into the fetch-decode-dispatch-execute queue, instructs the CPU to do certain pre-determined things. These instruction encoding numbers are also known as opcodes.
If you add custom instructions to the ISA, you are effectively adding new and unknown numbers (opcodes). If the CPU runs into an unknown opcode, it will raise an exception and set one of the exception registers. The kernel will read the exception register, trap, and will send SIGILL to the offending process, and the process will die and dump core.
Some CPU's do not raise an exception when attempting to decode an unknown opcode. They either ignore it, or in some cases wonder into undefined behavior land.
Most CPU's expect that the instructions that are fed to them are legal. That is, the CPU knows how to decode them.
If the CPU ignores unknown/illegal opcodes, and arbitrarily skips over them without raising an exception, your program will misbehave, often with catastrophic results.
There are CPU architectures that can fetch-decode-execute instructions for multiple ISA's, but these are very specialized, sophisticated and rare breeds of CPU's. This domain is reserved to VLIW ISA's and the CPU's that support them. And even for this case, you can't feed some arbitrary instruction opcode and expect it to decode-dispatch-execute correctly. There has to be a pre-defined and known set of encodings that the CPU knows how to handle.
An executing program can't keep calling CPUID - or its equivalent - repeatedly to determine whether the next instruction in the queue can be decoded and executed or not. The performance hit would be unacceptable. It's not just the cost of calling CPUID. You need to factor in the cost of the conditional branch associated with the decision on whether to decode-execute the next instruction, or not.