Multicore processors may offer a wealth of powers, but writing efficient code for them is notoriously difficult. Thankfully, help is close at hand. The Multicore Association created the Multicore Programming Practices (MPP) working group just over a year ago to produce a guide on best practices for writing multicore-ready …
Queues? The "revolutionary, pervasive approach" is QUEUES??
Why the F is it that every "revolutionary" advance is something that has been with us for oh so long? Really, when was the concept of the queue introduced? This is STUPID. As for scaling libraries, the "Intel® Threading Building Blocks" template library has been with us for quite a few years. And of course, some languages have had threading in them from day one (hello, Modula-2).
I don't understand
what everyone's problem is. You just follow best practices, never access your data from two threads at once, and away you go.
Most software in use every day can run fine as a single threaded app. The other CPUs can do all the other non-app tasks like background OS tasks, playing MP3 and all the other stuff that happens in a modern PC.
Between a Rock and a Hard Place
Interesting article. As much as the powers that be at Intel know that multithreading is not the answer to the parallel programming crisis, they also know that they have wedged themselves between a rock and a hard place. They have way too much invested in last century's legacy technologies to change strategy at this late stage of the game. It makes sense for them to acquire outfits like Rapidmind and Cilk. The former specializes in domain specific application development tools for multicore processors while the latter makes clever programs that discover concurrent tasks in legacy sequential programs. Intel's intelligentsia figure that this should give them enough breathing room while they contemplate their next move. Question is, is this a good strategy? I really don't think so.
In my opinion, Intel, AMD and the others are leading their customers astray by making them believe that multithreading is the future of parallel computing. Billions of dollars will be spent on making the painful transition by converting legacy code for the new thread-based multicore processors. The problem is that multithreading is not just a bad way to program parallel code (do a search on Dr. Edward Lee of Berkeley), it is the cause of the crisis. Neither Rapidmind nor Cilk are going to change that simple fact. These acquisitions are just a way of trying to force a round peg into a square hole as much as it can go. Sooner or later, a real maverick will show up and do the right thing and there shall be much weeping and gnashing of teeth . Being a behemoth is no guarantee of long-term survival in the computer business. Sorry, Intel.
Google or Bing "How to Solve the Parallel Programming Crisis" if you're interested in finding about the real solution to the crisis.
it's as simple as that.
Because it is hard
There isn't anything revolutionary about Apple's offering in a technical sense. They are ideas that have existed for at least 30 years. On the other hand no-one, has brought them to commercial reality, so in that sense it is revolutionary. Most of computing is reinventing or rediscovering ideas that are from the 60's of 70's. Astoundingly little is actually new.
Apple is providing little more than named blocks and an abstraction over parallel-do. But the magic is in the manner in which blocks are scheduled. It avoids the kernel where possible - thus avoiding expensive system calls for synchronisation if it can, and it can use dynamically configured schedulers that are a best fit for the hardware you are running on. Part of the trick is also that by queueing the parallel-do operations you are implicitly providing some causality information, at a level that is more helpful to the system. Coding this with locks and threads would be very time consuming, and nightmare to make configurable to different hardware.
@Voodoo. The problem with this "best practice" is that this is too simplistic. Just protecting data objects with locks, monitors or the like, does not protect you from many other sorts of subtle bugs, and can lead to very poor performance. Sadly it is very easy to craft code that runs across many processors that runs no faster than the original code on a single processor.
The old idea that it is easy because you have one core to do each user task, and maybe one other is very naïve. It takes no notice of the future that will give us 16+ cores on a typical machine in fairly short order. Nor does it help with the grunt work of more advanced applications. Larrabee is next year.
And of course we get the usual trolling from Louis Savain, who keeps regailing everyone about how brilliant he is, and how he has solved it all. Without ever having turned a line of code or ever demonstrated even the simplest actual running examples of his world saving paradigm. Slideware of ideas had in the shower. Bad ideas too.
Beside his take on parallel processing, Louis Savain would appear to have some other ideas not strictly in keeping with current scientific orthodoxy.
Re: Unconventional Thoughts
Yeah, but I'm right about parallel programming.
You will be right when you build your system and demonstrate that it a) works and b) is faster. The fact that you can't deliver *anything* says it all.
And since your understanding of hardware is smaller even than mine and since you clearly don't know anything significant about compilers (even the basics of how to write one) nor even understand the difference between a model and an implementation, everything you post here is spam.
Yo, BlueGreen. Identify yourself if you got any gonads.
A proprietary extension to C when I have to write cross-platform code? Go away. If it was a C/C++ API then it might be usable.
I recently used the Qt Concurrent framework to speed up a statistical calculation. Lots of floating point but not much memory access. Basically you break up the problem, put each part's data into an object that also has a method to process the data and turn a list of these objects over to the framework. It was rather fun to see my quad core hit 100% CPU and solve the proble in 1/4 the time. It is loosely based on the Google Map Reduce process. Breaking up the data into objects also makes it easy to keep the multi-thread access running without locks.
As I see it, the main problems yet to be solved are (1) the ability to run fast on all cores when doing memory intensive operations such as image processing and (2) the ability to run multiple short operations where the thread scheduling time becomes a problem. The first is basically due to hardware limitations and won't be solved in software. The second may be solved by better thread scheduling or programming language extensions.
At the instruction level, your "solution" sounds very like "pipelining" (Google or Bing it), and is already done in many modern processors. The problem with determining parallelism at this level is that a significant amount of high-level problem-specific information has been lost (i.e. the information that certain instructions can be run on different CPUs/cores which is implicit in threading).
At the software level, your "solution" to the parallel programming problem is just a description of a "task farm" (Google or Bing it) with a "Message Passing Interface" (Google or Bing it), except that the tasks are kept in a queue and must be carried out in order. That kind of parallelism is only good when you have many independent sub-problems, and is not suited to multiple worker threads operating on the same data (although it can be done).
However, for many problems, the overhead required for locking (including the idling of other threads which need to access the locked memory) is the killer, and in these cases it takes a lot of effort to code a solution that minimises the amount of locking required (i.e. a generic solution won't be as efficient).
Basically, concurrent memory updates are the problem, not concurrent execution of threads. The more concurrent memory updates your problem requires, the less of a benefit you will get from using a parallel algorithm.
Grand Central Dispatch (and other similar technologies) are only designed to make it easier to write code which can run in parallel, should you be able to identify suitable sub-problems.
All this talk about how hard parallel processing or multi-tasking software development is just a load of balderdash. I've never had any problem writing it and my code has stood the test of time in many client environments. If you can't take the heat, stay out of the kitchen.
The vendors want a paradigm that can be used by the outsourced, low-pay half-wits that the software houses favour. That's what this is all about but the reality is that with these sort of problems you require a specialised solution for every problem, that is you need to actually be reasonably smart and those horrid smart people demand a living wage for their efforts damn them.