Reply to post: Sophie Wilson post about early ARM history from 1988

Happy birthday, ARM1. It is 35 years since Britain's Acorn RISC Machine chip sipped power for the first time

Torben Mogensen

Sophie Wilson post about early ARM history from 1988

I saved a USENET post that Sophie Wilson made in November 1988 about the early ARM days. Since USENET is publicly archived, I don't think there are any IP issues with showing this. I think many of you will find the following interesting:

From: RWilson@acorn.co.uk

Newsgroups: comp.arch

Subject: Some facts about the Acorn RISC Machine

Keywords: Acorn RISC ARM

Message-ID: <543@acorn.UUCP>

Date: 2 Nov 88 18:03:47 GMT

Sender: andy@acorn.UUCP

Lines: 186

There have now been enough partially correct postings about the Acorn RISC

Machine (ARM) to justify semi-official comment.

History:

ARM is a key member of a 4 chip set designed by Acorn, beginning in 1984, to

make a low cost, high performance personal computer. Our slogan was/is "MIPs

for the masses". The casting vote in each design decision was to make the

final computer economic.

The chips are (1) ARM: a 32 bit RISC Microprocessor; (2) MEMC: a MMU and

DRAM/ROM controller; (3) VIDC: a video CRTC with on chip DACs and sound; and

(4) IOC: a chip containing I/O bus and interrupt control logic, real time

clocks, serial keyboard link, etc.

The first ARM (that referred to by David Chase @ Menlo Park) was designed at

Acorn and built using VLSI Technology Inc's (VTI) 3 micron double level metal

CMOS process using full custom techniques; samples, working first time, were

obtained on 26th April 1985. The target clock was 4MHz, but it ran at 8. The

timings that David gives are for the ARM Evaluation System, where ARM was run

at 3.3MHz and 6.6MHz (20/3) for initial and page-mode DRAM cycles,

respectively. The ARM comprises 24,000 transistors (circa 8,000 gates). Every

instruction is conditional, but there are neither delayed loads/stores nor

delayed branches (sorry, Martin Hanley). Call is via Branch and Link (same

timing as Branch). All instructions are abortable, to support virtual memory.

The first VIDC was obtained on 22nd Oct 1985, the first MEMC on 25th Feb 1986,

and the first IOC 30th Apr 1986. All were "right first time".

We then redesigned ARM to make it go faster (since, by this time, Acorn had

decided roughly what market to aim the completed machines at and 8MHz minimum

capability was required - but we did continue to develop software on the 3

micron part!). Some more FIQ registers were added, bringing the total to 27

(some of our "must go as fast as possible for real time reasons" code didn't

manage with the smaller set). A multiply instruction (2 bits per cycle,

terminate when multiplier exhausted so that 8xn multiply takes 4 cycles max)

and a set of coprocessor interfaces were added. Scaled indexed by register

shifted by register (i.e. effective address was ra+rb<<rc) was removed from

the instruction set (too hard to compile for) [scaled indexed by register

shifted by constant was NOT removed!].

The new, 2 micron ARM was right first time on 19th Feb 1987. It's peak

performance was 18MHz; its die size 230x230 mil^2; 25,000 transistors.

VTI were given a license to sell the chips to anyone. They renamed the chips:

VL86C010 (ARM), VL86C110 (MEMC), VL86C310 (VIDC), VL86C410 (IOC).

Acorn released volume machines "Acorn Archimedes" in June 1987. Briefly:

A305: 1/2 MByte, 1MByte floppy, graphics to 640x514x16 colours

A310: ditto, 1MByte

A310M: ditto with PC software emulator (circa a PC XT, if you're interested)

A440: 4MByte, 20MByte hard disc, 1152x896 graphics also.

All machines have ARM at 4/8MHz (circa 5000 dhrystones 1.1), 8 channel sound

synthesiser, proprietry OS, 6502 software emulator, software.... Prices

between 800 and 3000 pounds UK with monitor and mouse and all other useful

bits. Not available in the US, but try Olivetti Canada.

VTI make ARM available as an ASIC cell. Sanyo have taken a second source

license (in April 1988) for the chip set, and make a 32 bit microcomputer

(single chip controller). In "VLSI Systems Design" July 1988, the following

statements are made by VTI: ARM in 1.5 micron (18-20MHz clock), 180x180 mil^2;

future shrink to 1 micron (they are expecting "perhaps 40MHz" and 150 mil

square with the price dropping from $50 to $15); expected sales in 1988

90-100,000 units.

Contact Ron Cates, VTI Application Specific Logic Products Division,

Tempe, Arizona for details (e.g. the "VL86C010 RISC Family Data Manual").

Plug in boards for PCs are available. A controller for Laser printers

with ARM, MEMC, VIDC and 4MBytes DRAM has been sold to Olivetti [Acorn'

parent company as of 1985-6] (contact SWoodward@acorn.co.uk if you want to

know more).

In the Near Future:

We have a Floating Point Coprocessor interface chip working "in the lab" - the

fifth member of the four chip set. It interfaces an ATT WE32206 to ARM's

coprocessor bus. It benchmarks at 95.5 KFlops LINPACK DP FORTRAN Rolled BLAS

(slowest) (11KFlops with a floating point emulator) on an A310. Definitely

have to make our own, some time...

Acorn is about to release UNIX 4.3BSD including TCP/IP, NFS, X Windows and

IXI's X.desktop on the A440. Contact MJenkin@acorn.co.uk or

DSlight@acorn.co.uk for more info (and to be told that it isn't available in

the US {yet}).

Operating Systems:

Acorn's proprietry OS "Arthur" is written in machine code: it fills 1/2MByte

of ROM! (yes, writing in RISC machine code is truly wonderful as others have

noted on comp.arch). Its main features are windows, anti-aliased fonts

(wonderful at 90 pixels per inch - I use 8 point all the time) and sound

synthesis. It runs on all Archimedes machines. A 2nd release is due real soon

now and features multitasking, a better desktop and a name change to RISC OS.

VTI are porting VRTX to the ARM; Cambridge (UK) Computer Lab's Tripos has been

ported to A310/A440. UNIX has been ported by Acorn: see above. There are MINIX

ports everywhere one looks (try querying the net...).

Software:

C Compiler: ANSI/pcc; register allocation by graph colouring; code motion;

dead code elimation; tail call elimination; very good local code generation;

CSE and cross-jumping work and will be in the next release. No peepholing (yet

- not much advantage, I'm afraid). Can't turn off most optimisation features.

Also FORTRAN 77, ISO PASCAL, interpreted BASIC (structured BBC BASIC, very

fast), Forth, Algol, APL, Smalltalk 80 (as seen at OOPSLA 88: on an A440 it

approximates a Dorado) and others (LISP, Prolog, ML, Ponder, BCPL....).

Specific applications for Archimedes computers are too numerous to mention!

(though the high speed Mandelbrot calculation has to be seen to be believed -

one iteration of the set in 28 clock ticks [32 bit fixed point] real time

scroll across the set [calculate row/column in a frame time and move the

picture]).

There is a part of the net that talks about Archimedes machines:

(eunet.micro.acorn).

Random Info:

Code density is approximately that of 80x86/68020. Occasionally 30% worse

(usually on very small programs).

The average number of ticks per instruction 1.895 (claims VTI - we've never

bothered to measure it).

DRAM page mode is controlled by the MEMC, but there is a prediction signal

from the ARM saying "I will use a sequential address in the next cycle" which

helps the timing a great deal! S=125nS, N=250nS with current MEMC and DRAM

(see David Chase's article for instruction timing). Static RAM ARM systems

have been implemented up to 18MHz - S=N=1/18 with these systems.

Approximately 1000 dhrystones 1.1 per MHz if N=S; about 1000/1.895 dhrystones

per MHz if N=2S (i.e. 5K dhrystones for a 4/8MHz system; 18K dhrystones for

an 18/18MHz system).

Most recent features: Electronic Design Jul 28 1988, VLSI Systems Design July

1988.

We had a competition to see who would use "ra := rb op rc shifted by rd" with

all of ra, rb, rc and rd actually different registers, but the graphics people

won it too easily!

ARM's byte sex is as VAX and NS32000 (little endian). The byte sex of a 32 bit

word can be changed in 4 clock ticks by:

EOR R1,R0,R0,R0R #16

BIC R1,R1,#&FF0000

MOV R0,R0,ROR #8

EOR R0,R0,R1,LSR #8

which reverses R0's bytes. Shifting and operating in one instruction is fun.

Shifted 8bit constants (see David Chase's article) catch virtually everything.

Major use of block register load/save (via bitmask) is procedure entry/exit.

And graphics - you just can't keep those boys down. The C and BCPL compilers

turn some multiple ordinary loads into single block loads.

urn some multiple ordinary loads into single block loads.

MEMC's Content Addressable Memory inverted page table contains 128 entries.

This gives rather large pages (32KBytes with 4MBytes of RAM) and one can't

have the same page at two virtual addresses. Our UNIX hackers revolted, but

are now learning to love it (there's a nice bit in the standard kernel which

goes "allocate 31 pages to start a new process"....)

Data types: byte, word aligned word, and multi-word (usually with a

coprocessor e.g. single, double, double extended floating point).

Neatest trick: compressing all binary images by around a factor of 2. The

decompression is done FASTER than reading the extra data from a 5MBit

winchester!

Enough! (too much?) Specific questions to me, general brickbats to the net.

.....Roger Wilson (RWilson@Acorn.co.uk)

DISCLAIMER: (I speak for me only, etc.)

The above is all a fiction constructed by an outline processor, a thesaurus

and a grammatical checker. It wasn't even my computer, nor was I near it at

the time.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon