"It's very distressing - I'm watching almost with disbelief. The Americans cannot get it out of their heads that if you're trying to build machines with lots of processors, you don't assume that they all share a common memory. The world doesn't have a common database. We pass messages to one another." David May, professor of …
Amen. Message passing is where it's at.
As someone who has to constantly deal with multithreading hell, I fully agree. Debugging other people's synchronization errors, mutex and semaphore issues, performance cratering due to shared access issues, ararararargh. Replace it all with message queues and it's so much more deterministic, and faster as well. The queues have to be synced and high performance, but that's a single point to optimize. You can still share huge areas of memory (if you do have shared memory) by passing pointers but using the messages as permission to access and doing that infrequently.
Something with super-fast efficient message passing like Barrelfish just makes me weak in the knees - not worrying about cache coherency is a great thing. So c'mon, don't say 'The Americans' when you just mean Intel. Even Microsoft realizes Intel is dogging the wrong bone again, like they did with the P4 before the Israelis showed them the right way.
You must not mix the abstraction and the architecture.
Abstractions : Mutex, Semaphores,Threading, Messaging
Architecture : Shared-Memory or Distributed Memory
The abstraction may or may not map well to the Architecture. The shared-memory architecture may make it possible to efficiently run some algorithms that cannot be run efficiently on a distributed-memory architecture (Algorithms where each compute node needs to access the whole problem state; certainly something in Quantum Mechanics falls into that area)
The threading idea is a rather messy thing anyway.
How about transactional memory as used in Clojure? Try Erlang or Linda. How about parallel Prolog? The possibilities are vast.
Parallel processing is like Christianity
It has not been tried and found wanting, it has been found difficult and left untried (as Chesterton put it). Now we're looking at dozens of cores on a chip, but can we make effective use of them? On a server supporting multiple independent sessions (and probably with some VM going on as well) we can - but in a desktop or a supercomputer dedicated to one task?
Best processor .. ever
We used loads of transputers when I worked at Ferranti for sonar processing. They were a joy to program. Parallel programming was incredible natural. You could simulate a system on one transputer then quickly reconfigure them to run across multiple transputers.
If the world had been a fair and equal place it would of been the start of a new programming revolution, however Inmos hit the buffers when they tried to goto the next generation. The T9000 was great on paper, but they could never make it work and finally we gave up and went to dedicated DSP's.
Also while OCCAM was great, generally we used the Inmos C implementation
> Also while OCCAM was great, generally we used the Inmos C implementation
Did you know that the Inmos C compiler was a C-to-Occam translator on top of the Occam compiler?
Running "strings" against the executable show its provenance :-)
@Vic: Well I never
Well, I didn't know that! I never used Occam itself, but I'd no idea that the C compiler was done that way.
On the whole I didn't like the dev tools very much. Debugging was always a nightmare. I Always felt that Transputers needed some sort of separate debugging bus rather than relying on the Transputer channels themselves. But JTAG hadn't been invented then.
Transputer Sonar and other heroes...
I did stuff on transputers doing sonar processing as well, while I was at GEC. 4K point FFTs calculated, stored and displayed in real time when the PCs of the time struggled to do 256 point.
I still cannot get over people who say you can't program in parallel: what have the hardware engineers been doing for the last 50 years. What they actually mean is that the way people go about programing at the moment makes it very difficult... but that way is not the only way.
I was hoping the article would mention another of the heroes: Prof Peter Welch, who has backed CSP and Occam for as long as they've been around. Working from his base at Kent University he has pushed message passing and the CSP way to over generations of students from Europe and beyond.
And finally, others have mentioned the transterpreter, but there's also other projects: see www.wotug.org for more details!
The Transputer aten't dead
Before mourning the passing of the Transputer, it's worth remembering that they're still shipping in vast numbers.
If you've got a digital TV or set-top box, there's a strong chance it's got a transputer in it.
[Disclosure: I spent more years than I care to admit to on the Aztec West site...]
"The TM9900 was a PDP-11-like machine on a chip,"
That's a very odd thing to say about the TMS9900, if the quote is accurate. The two have almost nothing in common, except they are both 16bit machines and the 9900 happens to have auto-increment register addressing (which is just one of several orthogonal and consistent addressing modes on the PDP11 but not on the 9900). One has 8 registers including PC and SP, one has 16 including workspace pointer and linkage register but no accessible PC and no SP (and the registers aren't really registers because they're in memory), only one has a stack, only one has meaningful PC-relative addressing, only one has sensible OSes... have you heard enough yet?
If you wanted a PDP-11-like machine on a chip DEC had them, even in the 9900 era, in the T11 (low performance but high integration especially in comparison with a 9900) and J11 (nice fast 11/70-class chip). The T11 was even available on the open market and on single board computers whereas the J11 wasn't really.
Other than that, Transputers and Occam were indeed interesting, Tony Hoare's stuff on CSP in particular, but like the IT industry in general, it takes a long time for good ideas to be re-discovered and become trendy again.
Is it right to write a Transputer article without mentioning companies like Parsys and 3L? Are there others deserving a mention?
Studied Occam at uni...
...but we weren't allowed to touch the Meiko Computing Surface, but we did get to see it run the real-time fractal rendering mentioned in the article. Seriously cool back in 1990.
ISTR there was a BBC micro or Archimedes interface available for programming them.
Oddly enough, in the desk I inherited when I got my first "proper" graduate job, there was a Transputer chip (in its padded case) in the drawer. I don't think anyone knew (or cared) that it was there. I should have kept it as a collector's item.
Yeah, free BBC Micro with every Transputer
And the BBC Micro wasn't cheap...
Cloud computing. Time for Transputers all over again maybe? Perhaps?
I feel old..
I remember when Atari released the ATW, and after seeing it in action several times, I fell in love with the architecture.
It was not just Occam though that made this system so great, but also Helios, because it made the systems parallel DNA run from the CPU all the way up to the OS/GUI.
Such a shame it was just beyond mere mortals wallet, and the fact that Atari always had very special ideas about marketing actually was..
Who knows, maybe one day we'll get there again.
I remember a demo of one of the first Transputers by one of our engineering lecturers at the Polytechnic I worked at in the mid 1980's.
At the start of the talk, the lecturer kicked off a BBC Micro rendering a complex fractal.
At the end of the talk 45 minutes later, the fractal was not finished (about third way through IIRC).
The lecturer then kicked off the single Transputer to do the fractal and it took a couple of minutes,
blowing the BBC Micro out of the water. We were in awe; we'd seen the future.
Shame it didn't work out.
Ever since Hyperthreading and true parallelism in CPU & GPU have been on the rise, I always remember that Transputer demo and wonder what might have been for UK computer industry.
Re: Deja vu
"The lecturer then kicked off the single Transputer to do the fractal and it took a couple of minutes, blowing the BBC Micro out of the water. We were in awe; we'd seen the future."
I remember a similar situation a few years later where, during a university visit, someone demonstrated a Transputer solution rendering a Mandelbrot image and said how quick it was, and the response from the gathered prospective students was more or less, "That's quick? My Archimedes can render that in a couple of seconds."
Despite the merits of the Transputer, it suffered from the same problems as many other interesting technologies: up against mainstream CPUs surfing the wave of continuous fabrication improvements, the hardware couldn't remain competitive for the majority of applications around at the time.
...sure, a single Arch could beat a single Transputer, but 40 or so could render Mandelbrot sets (reflected in water, too) in real time. I'd like to see 40 Archimedes linked together to do the same.
"I'd like to see 40 Archimedes linked together to do the same."
What you're describing is precisely what render farms do today - perhaps not in real time since it obviously helps to have the interconnect and that is obviously one of the Transputer architecture's strengths - but shovelling the computations over a network is completely feasible. I remember when the Archimedes came out, A&B Computing had one of the benchmarks running on 20 BBC Micros and it was slower than the Archimedes. But that wasn't my point.
My point was that for most applications of the day, increased mainstream silicon performance got people where they wanted more quickly and for less cash. When you have a bunch of sixth-formers saying that their home computers do the same job quicker and for a lot less, who cares about buying even more kit and hoping that you'll eventually be outrunning the mainstream? A lot fewer people, that's who.
Exactly this phenomenon brought quite a few technologies to an abrupt end. Acorn wanted to do multiprocessor stuff but could only offer a bunch of 30MHz ARM CPUs, which isn't that much good when the mainstream gear is pushing 200-300MHz and offering other performance-improving measures.
I remember Transputers being talked about in hushed tones as the way of the future at the time. Shame it didn't work out.
I think like a lot of people I saw the transputer first of all on Tomorrow's World where it was raytracing a shiny Newton's cradle in real time. Thanks for telling the whole story.
And I couldn't help but think of Transputer last year when an Intel keynote conference talk was spent saying how widespread parallel processing was almost here...
Surprised the one attempt to market a product isn't mentioned
I'm thinking the Atari Transputer Workstation of the late 1980s......
Re: Surprised the one attempt to market a product isn't mentioned
But there were products to buy: as others have mentioned the Meiko gear was quite popular in academia, for instance.
we played with transputers when I did my degree (1984-88) ... and only yesterday, I was suggesting to the wife she could install the Folding@Home screensaver and do something useful with the PC in the downtime ... she asked how it worked, and the transputer got mentioned ...
We need a misty-eyed nostalgia icon
The good old days
Was just debating this article on FB, and the relative merits of Occam and the folding Origami editor (which I hated) but tellingly we both moved to the Meiko Computing Surface to do our actual work - A proper UNIX environment with message passing in C.
The current situation of having your "conventional" programming language talking to a jumped-up shader language is a backward step.
www.transterpreter.org for those of you who want to have the Transputer experience.
He's still at it
I have an XMOS dev kit at home; makes a very nice control system for all sorts of things that need fast, parallel IO-heavy stuff. Its a fair bit easier than learning Verilog and using a CPLD or FPGA instead if perhaps not so blindingly fast.
Good to see these processors get recognition instead of the Cell processor which seems to borrow lots of ideas from the transputers.
Ultimately the transputer guys jumped the gun and believed the jump from 16-bit to 32-bit wasn't yielding the performance gains expected. Much like 64-bit isn't really making things run much faster either.
Great for embedded applications
The inherent parallelism of a single Transputer made writing software for boxes with user interfaces and asynchronous data inputs/outputs very easy. I designed such a box and when my new boss asked me why I hadn't used an 8051he got a page of good reasons.
I ought to point out that the Occam language is thriving beyond Bristol. A group at Allegheny College in the USA (see http://transterpreter.org/) have ported an Occam dialect to the Atmel's Atmega328 microcontroller chip and appear to have generated a lot of interest among users of this tiny low-cost device - who include robot makers, animatronic artists and hobbyists. And I keep nudging its name as often as I can get away with it in PC Pro (see for example http://www.pcpro.co.uk/features/357853/how-to-build-a-computer-smarter-than-a-us-president)
Loved Occam; loved the editor. When I started using C it was a real shock how backward the language was.
Thorn EMI was of course about the worst possible owner.
Maggie Thatcher of course didn't believe in Manufacturing or Industry. It's not all in China.
Need multiple separate RAM
A fantastic British invention, such a shame we sold it to the French and it didn't take off.
I decided at a previous employer to use transputers to develop a scalable chart-recorder, to replace their pen-based recorders. The company wanted to develop small, 5 inch, wide medical recorders all the way up to metre wide, multi-channel recorders running at about 1m/s chart speed. Transputers seemed the ideal choice.
I was still very green at the time, but they were such fun days.
Unfortunately the only product that was produced was the 5" medical recorder, max speed 5mm/s. Transputer + I/O shenanigans made the project a nightmare. It had a single transputer for the processing, plus PIC micro-controllers for the UI and running the thermal fax head. A single 68000 would have done the lot.
Still a fantastic product though. Shame, shame, shame.
I remember Transputers
Worked at GEC Alsthom Transmission and Distribution Power Electronic Systems, which is now some division of Alstom. These guys built the high-voltage DC link that electrically connect the UK to France, plus many more like it around the world. I wrote the control software (from other people's designs) for one of them.
Wonderful system, and great to use. We ran out of processing power at one stage, and with any other hardware we would have been totally stuffed. With Transputers we just put another T8 processor on there, repartitioned the control software and moved on. Analogue and digital I/O cards had a low-spec T2 processor on each card which passed data around without the control software needing to know about the I/O details.
But yeah, once they were sold to ST, they were basically dead. We were promised the new generation of chip, but they never got it working. Then they swung the axe, and suddenly our place realised there were never going to be any more chips, so they had to buy pretty much the entire world stock of Transputers to get them through their current projects and service contracts, whilst they rushed to get a new platform up and working.
I don't know what the problem is. Multi-core processors are nothing more or less than a distributed system on a chip, so proceed accordingly. Separate RAM is essential. Shared RAM is pointless (unless you're going to build the on-chip equivalent of a SAN).
The article does a great job explaining why SeaMicro is the future.
I've met Brian nice chap
Yes, worked with a pair of T414's on my final year project at Uni, never could get the board to work though (my fault entirely). Also used them in experimental systems at the MoD in the following few years. Great for embedded systems.
These days I work in the Bristol area on CPU's. When they let me.
Takes me back...
This takes me back to my final year at University. My final year project involved writing a compiler for Occam. Being short of actual Transputers I wrote a virtual machine consisting of a number of Transputer cores with associated channels.
Another OCCAM university user
Maths & Computation undergrads at Oxford did Parallel computing practicals in OCCAM back in 1991. I remember reading Tony Hoare's text book, and still have my practicals...in print out form :)
No transputer in the computing lab then though - instead we compiled to run over as many of the Sun computers in the lab as we needed processors. It was great to watch those engineers wonder why their computers were so slow. And the editor was good old 'ved'.
Blast from the past
I'd entirely forgotten about Transputers. We got our hands on an Atari ABAQ in the first company I worked in, must have been '88 I guess. I remember the demos on the machine were awesome in terms of speed for the time. It wasn't to last however, as the machine was shipped off to the company's German office after a few weeks, and never arrived :-(
"Tony Hoare's text book" aka "Communicating Sequential Processes"
"Communicating Sequential Processes" is highly recommended and seems to be freely downloadable now in most parts of the world, see http://www.usingcsp.com/
Sadly that site also reveals that Mr Hoare has taken the Microsoft shilling, as have quite a few other well known names.
Which is a real shame, because if folk like Hoare were on the outside rather than the inside they might have a few things to say about Microsoft's capabilities and products.
Still, he deserves a bit of recognition (and dosh). Shame he can't do anything for the product itself.
Remember it well
I still have an almost complete collection of every processor board Meiko ever made (complete with the green wire field modifications on the back).
One little known feature of the transputer was it supported multiple execution threads (including the thread scheduling) in hardware. Modern chips also support multiple threads but the OS is much more involved in the scheduling of them.
The transputer was not a multi core processor so only one thread actually ran at any one time but the threads were there so you could specify your problem as lots of tiny micro-tasks that automatically switched to a runnable state once their input data was available.
The interesting thing is that many of the programming models being proposed for future exa-scale systems are returning to this kind of thinking so we may see a return of designs like this.
Any recommended reading?
The article shows 'A Tutorial Introduction to Occam Programming' — as someone who's naturally curious and likes books they can hold, is that a good book to learn about Transputer's approach to parallelism and the Occam language? If I'm going to buy just a single book, is there anything else better? I guess I'm asking for the Occam equivalent of the Smalltalk blue book.
I'm perhaps biased as one of the authors, but I don't think there is another paper tutorial (as opposed to reference) book on Occam. It's out of print but there are used copies on Amazon.
"64-bit isn't really making things run much faster either."
Why would anybody expect 64bit addressing (which is what most sensible folks mean by '64 bit') to make things run faster? Other than if they'd been believing the marketing, obviously.
64bit addressing might allow you to run bigger things, but if you run the same things in a 32bit addressing version vs its 64bit direct equivalent, on equivalent (or even the same) hardware, why would you expect the 64bit one to run faster? Try it on a Linux/x86 someday, though the waters are somewhat confused there because of the instruction sets, registers, etc not really being quite equivalent.
We thought it was a vision of the future
In the early '80s I ran a series of symposia for IBM UK "thought leaders" at Cambridge University. The theme of the first one was "Change" and we invited Inmos along to demonstrate the transputer technology, which some of us thought IBM should invest in.
At around that time IBM was proud to have produced a complex ray-traced image of a Newton's Cradle sitting on a chess board. This was done in under 24 hours on a high end general purpose mainframe.
Inmos demonstrated the same image before our eyes in minutes using a couple of shoebox-sized pieces of hardware. They also demonstrated how the performance could be increased by simply adding more transputers without powering off the machine.
Following the demonstration the group discussion decided that it was obviously done with smoke and mirrors and had no commercial value!
I learned a lot from that presentation, mainly to make sure that change was introduced in small increments. This was proof of Clark's 3rd Law - "Any sufficiently advanced technology is indistinguishable from magic."
I cut my professional teeth on Transputers. Still using the same message passing principles 20 years later.
Get real Inmos was a disaster zone
Why do we remeber britains technical and commercial failures through rose tinted glasses?
I was a system design engineer when the transputer was launched. It was nowehere near competive. It was targeted at number crunching tasks but teh performanc eof the individual pressors was so poor we calculated we would need more than 200 to replace the single bit slice procesor we had at the time. DSP devices were starting to appear at the time and they made a lot more sense. Developing a distributed message passing algorithm is much harder than on more conventional platforms so we estimated a 5 times increase in development effort.
Selling a product in which the power consumptio, space and cost are two orders of magnitude greater than competing technologies and the development costs are one order of magnitude greater is not smart.
We did use large quantities of Inmos DRAM becaus ethe nibble mod eparts were at the time the fastest available however we quickly discovered a design flaw that resulte din patern sensitive corruption of the memory. Inmos denied thsi for more than 1 year despite us being one of their largest customers and the fact we could reliably produce a problem right from the start on approximately 1% of all devices.
Poor customer service
No mystery about why the company no longer exists.
Lets produce commercially viable technologically innovative products with high qualty not fantasise about products thatwere badly conceived and poorl;y executed.
Some of the *best* & *worst* of British innovation
High density code using "prefix" coding on byte sized chucks, potentially allowing processors with external data buses made of *any* number of bytes, giving (with the right job mix) 10 MIPS at a time when 1MIPS was impressive.
Interrupt service routines coded *exactly* as a normal process attached to the relevant pins.
Stack architecture (internally the model used by many if not *most* compilers) in hardware. Could have support FORTH as readily as C.
Hardware scheduler for *all* processes, including "messages" on serial I/O buses (which I think live on as "Firewire").
Use of formal methods to verify the FPU.
Bit counting instructions which allowed software decoding of GPS in real time.
Software architecture *supporting* (rather than allowing) apps to be broken into processes and distributing across as many processors as necessary *without* change.
Tungsten Silicide gate material may have been what gave it high radiation hardness.
On chip clock generation from *relatively* low frequency shared clock, simplifying board design (not sure how modern chips do it theses days).
It's designed to be used in *big* arrays but they premium priced it like an Intel processor as core of a system, at a time when you *desperately* want design wins to build volume.
No low end version (Like the 68008) which could have been 1 bit serial internally with a byte data bus. Poor performance but that *total* scaleability across the range would let a customer scale up as market and budget allowed.
No MMU, because it's going to be used in big arrays with no address translation (but it costs an arm and a leg)
No *nix port because no MMU (yes it's possible but it's a PITA without one)
Ran US DRAM operation as "Cash cow," which most start ups don't have and don't know how to make effective use of.
"Origami" editor. No doubt very neat but made *another* entry barrier to discourage people from learning it at a time you want *maximum* exposure.
Money from sale of Inmos went to Treasury, *not* Inmos, which somehow did not seem to be appreciated at the time. Inmos got nothing for the sale of itself to Thompson.
Welsh chip plant got stuffed one Christmas because they failed to realise the water company would dump a shed load of disinfectant in the water *without* telling them and clearing off for the holiday, trashing the ion exchange system.
I think ARM learned a lot of lessons from Inmos, but most of them were how *not* to do things.
Amdahls law was known 40 years ago and Intel still don't get it. 50 cores /1 *data* bus.
Can you spell "contention"?
- Top Gear Tigers and Bingo Boilers: Farewell then, Phones4U
- Analysis iPhone 6: The final straw for Android makers eaten alive by the data parasite?
- Stephen Pie iPhone 6: Most exquisite MOBILE? No. It is the Most Exquisite THING. EVER
- First Crack Bloke buys iPHONE 6 and DROPS IT to SMASH on PURPOSE
- Early result from Scots indyref vote? NAW, Jimmy - it's a SCAM