So you thought Intel was a hardware company? In fact, it's also a major supplier of software – compilers and developer tools. This was what the Think Parallel Intel EMEA Software Conference 2.0 in Lisbon was all about. I've only space to cover the main theme here (there was an interesting session I must return to, on Second …
Intuitive up to a point
The problem with parallel programming is that it adds a third dimension in which one has to think. It's no longer just data organisation and control flow, there is now also communication to think about.
The change in thinking that is required does not compare with the structure programming and OO "revolutions" which are just more convenient ways of thinking about the Von Neumann architecture.
Because parallel programming is inherently more complex than sequential coding, it is both easier to make mistakes and harder to track them down (especially timing-dependent bogeys that go into hiding when one hauls out the debugger).
One solution may be to reduce complexity in other parts of development environment - although this has already been tried in the form of the Occam language. In retrospect there was nothing wrong with Occam (once floating-point support had been added!), its sin was that it was not Fortran or C.
In my view the ideal solution for applications outside the number-crunching realm would be something similar to the swingeingly expensive G2 "real time intelligent system" in which it is remarkably easy to express searches that are executed as concurrent tasks. When I last used it (some 10 years ago) G2 was an interpretive system and so was not exactly suitable for number crunching.
Over and above issues with the development environment, there is also the challenge of developing parallel algorithms. At least in the fluid dynamics area in which I've worked, the "smarter" algorithms always came with an increased degree of data coupling - in space or time. Loosening the coupling between data elements would generally result in a loss of algorithmic efficiency.
old news or not ?
To those of us in the HPC community this seems like very old news. Every major supercomputer for the last 10 years has needed parallel programming.
However in this arena we have long since hit the point where codes are limited by the memory system more than the instruction rate. For this reason most really big systems are distributed memory rather than thread based. If you think thread programming is hard you should try distributing a problem across multiple memory systems.
Intels move to massive core counts could have one of 2 outcomes.
1) large scale scalable shared memory systems become commodity and the HPC arena becomes much easier.
2) people discover that having lots of cores attached to a rubbish memory system goes at the same speed as a couple of cores no matter what you do.
Guess which one I believe :-(
Parallel Needs New Designs
Parallel programming is tough and will remain so, but it need not be one step short of impossible.
One of the keys to good parallel programs is the decomposition of the job into tasks which perform independent (or close to that) operations. The highest level of this needs to be done at the design stage.
Multitasking happens at two scales - the large individual tasks and the small steps that make up a task. In many cases, working at the task level is enough for effective parallel programming, but where individual tasks are large, they may need to be done in parallel steps, essentially a micro tasking.
However, micro tasking steps runs a danger of increasing complexity and overhead. So the task level parallel work will require design level choices, while the step level parallelism will need checking and testing for overhead. If the steps are so small that parallel code is a 50% overhead, the extra complexity will probably not pay useful dividends.
Ultimately, if task level parallelism still can't handle the workload, the best answer may be splitting the workload and running multiple copies of the whole program, essentially segmenting the workload rather than increasing the internal level of parallel programming.
We already do this in multiple system web servers or file servers as Akami uses. But the same segmentation and replicatiion approach could be a better answer than trying to increase parallel operations at the micro level because of overhead.