Re: This is why AMD and NVidia are making ARM chips
Design errors [of ARM 32], like ... making every instruction conditional
I am totally not au fait with CPU instruction sets as I have abandoned that particular specialization after writing floating point arithmetic operations for NS32032 at uni, but are these instructions for "predicated execution" as used in IA-64 "Merced" and the Zuse Z-3, meant to reduce (or eliminate) branches?
See: Konrad Zuse deserves even more credit
The IEEE Computer Article referenced in the above is actually 'Challenges and Trends in Processor Design', Janet Wilson, IEEE Computer Magazone, January 1998, with the item "Introduction to Predicated Execution" by Web-mei Hwu, University of Illinois, Urbana-Champaign, where we read:
The story of Merced, Intel’s first processor based on its next-generation 64-bit architecture, will continue to unfold in 1998, Intel expects this product of its collaboration with Hewlett-Packard to reach volume production in 1999. To date, however, the two companies have released few details about Intel Architecture 64 (IA-64). One significant change they did admit to at the October 1997 Microprocessor Forum was the switch to full predicated execution, a technique that no other commercial general-purpose processor employs.
[IEEE Computer] wanted to give its readers advance notice of this promising technique. We invited Wen-mei Hwu, a prominent researcher in this area, to explain predication, a topic you may be hearing more about in 1998. -- Janet Wilson
Predicated execution is a mechanism that supports the conditional execution of individual operations. Compared to a conventional instruction set, an operation in a predicated-execution architecture has an additional input operand -a predicate- that can assume a value of true or false. During runtime, a predicated-execution processor fetches operations regardless of their prcdicatc value. The processor executes operations with true predicates normally; it nullifies operations with false predicates and prevents them from modifying the processor state. Using predication inherently changes the representation of a program’s control flow. A conventional instruction set requires all control flow to be explicitly represented in the form of branches, the only mechanism available to conditionally execute operations. An instruction set with predicated execution, however, can support conditional execution via either conventional branches or predicated operations.
Providing compiler support for predicated execution is challenging. Current optimizing compilers rely on control flow representation as the foundation of analysis and optimization. Because predicated code changes the control flow representation, effectively handling it requires an extensive modification of the compiler infrastructure, particularly in the areas of classical and ILP optimizations, code scheduling, and register allocation. An effective compiler must balance the control flow and the use of predication. If resources become oversubscribed or dependence heights (the lengths of the chains of dependent operations) become unbalanced among paths, predicated execution can degrade performance.
Predicated execution started as a software approach to avoiding conditional branches in early supercomputers. Vector architectures such as the Cray 1 and array-processing architectures such as Illiac IV adopted predication in the form of mask registers to allow effective vectorization of loops with conditional branches. During the era of mini-supercomputers, the Cydrome Cydra 5 became the first machine to support generalized predication. Parallel to the Cydra 5, the Multiflow Trace machine adopted partial predication by introducing a single instruction with a predicate input, a select instruction. Contemporary processors, such as the DEC Alpha and the Sparc V9, have adopted the partial-predication approach so they can maintain a 32-bit instruction encoding.