Stanford University engineers are claiming a record for the year-old Sequoia supercomputer, after running up a calculation that used more than a million of the machine’s cores at once. The work was conducted by the university’s Centre for Turbulence Research, seeking to get a model for supersonic jet noise that’s more …

COMMENTS

House rules Send corrections

This topic is closed for new posts.

Tuesday 29th January 2013 00:05 GMT John Smith 19

Quite impressive in term of size but am I alone in wondering.

Shouldn't this SIMD thing just work by now instead needing lots of twiddling?

Thumbs up as this is a serious problem for aircraft and if anyone wants to taking a scheduled supersonic flight in their lifetime this is going to be needed.

0 1
1. Tuesday 29th January 2013 00:45 GMT Frumious Bandersnatch
  
  Re: Quite impressive in term of size but am I alone in wondering.
  
  Shouldn't this SIMD thing just work by now instead needing lots of twiddling?
  
  It's not just SIMD. Although the article doesn't state it explicitly, each of the cores models a small area of space and it has to communicate various outputs to neighbouring small areas of space. The clue is in the line The waves propagating throughout the simulation require a carefully orchestrated balance between computation, memory and communication. Amdahl's Law puts a brake on how well any real-world computation like this will scale up when run on a parallel (or SIMD) architecture due to the need for components to interconnect and transfer data between each other (such as propagating global force/pressure vectors after each local computation per simulation time quantum) . In this case, I'm sure a lot of their time spent "ironing out the wrinkles" was trying to get those inter-core messaging parts of the simulation humming. But there are other potential bottlenecks too that need to be looked at to prevent stalls/starvation too (ie, "computation, memory and communication" above). There's definitely not just a single "point and shoot" solution to parallel programming.
  
  4 0
2. Tuesday 29th January 2013 07:45 GMT Anonymous Coward
  
  Re: Quite impressive in term of size but am I alone in wondering.
  
  "Shouldn't this SIMD thing just work by now instead needing lots of twiddling?"
  
  That's sort of like asking why there's still bad software in the world--shouldn't compilers just produce good software?
  
  Making software run on multiple cores (not SIMD, but whatever) is more art than science, no matter how many compilers come along promising to do it for you automatically.
  
  1 0
  1. Tuesday 29th January 2013 08:02 GMT Michael H.F. Wilkinson
    
    Re: Quite impressive in term of size but am I alone in wondering.
    
    Dead right. Getting a million cores running effectively is very hard!! It is not just Amdahl's law that can get in the way (i.e. the max speed-up is limited by a section of the code that is sequential), it is communication overhead. We are working on a problem that, if implemented naively, would required O(N log N) communication, with N the number of pixels, which in practice means that the largest data set we have to work on (1.5 Tpixel) requires in the order of 120TB of data traffic. We are trying to get that down to O(G √N log N), with G the number of grey levels, which boils down to in the order of 240 GB of traffic in our case. Still a lot, but it should bring the algorithm into the realms of the possible.
    
    I do not see compilers take over this sort of redesign of the code automatically any time soon
    
    5 0
3. Tuesday 29th January 2013 08:29 GMT Captain TickTock
  
  Re: Quite impressive in term of size but am I alone in wondering.
  
  "Shouldn't this SIMD thing just work by now instead needing lots of twiddling?"
  
  What SIMD thing?
  
  1 0
  1. Tuesday 29th January 2013 17:07 GMT Canecutter
    
    Re: Quite impressive in term of size but am I alone in wondering.
    
    If by your question, you wish to find out what SIMD is, it is one of four models for organisation of parallel computations and parallel computers.
    
    S - Single I - Instruction (stream) M - Multiple D - Data (stream)
    
    The other three (just for completeness' sake are as follows.
    
    SISD (Single Instruction, Single Data);
    
    MISD (Multiple Instruction, Single Data);
    
    MIMD (Multiple Instruction, Multiple Data).
    
    The most common ones commercially are SISD (Standard Serial Computing), SIMD, and MIMD.
    
    0 0
4. Tuesday 29th January 2013 09:24 GMT Stuart Castle
  
  Re: Quite impressive in term of size but am I alone in wondering.
  
  It's a little simplistic to say that this SIMD thing should just work. A friend of mine used to write code for multi-core systems. She said it's relatively simple to code for 2 or 4 cores. When that code has to scale to thousands (or even a million, in this case), then scheduling and managing communication between the threads running on each core becomes both a massive job and a nightmare.
  
  1 0
5. Tuesday 29th January 2013 10:40 GMT Rob Carriere
  
  Re: Quite impressive in term of size but am I alone in wondering.
  
  @John Smith 19: The SIMD thing _does_ just work. Out of the box, no prob.
  
  It's when you insist that you want every last bit of performance from your hardware that things get difficult. Running a computer at a couple percent efficiency isn't hard and for many situations is plenty good enough. Running your home PC at anything approaching full throttle is an interesting engineering task. Running a million CPU super at efficiency is a serious challenge.
  
  So, it's not that there aren't tools. There are and they do a decent job. But a team of specialists piling on the man-months can do better. So it becomes economics: is the cost of the specialists worth the additional performance? This is really no different from programs like Photoshop containing a few hand-coded pieces of assembly. The compiler is good, but sometimes it's worth it to do better.
  
  0 0
6. Tuesday 29th January 2013 11:13 GMT Evil Auditor
  
  Re: supersonic flight
  
  John Smith 19, this simulation is not about supersonic flight but about gas coming out of the engine at supersonic speeds. A very common thing among jet engines...
  
  1 0
Tuesday 29th January 2013 09:03 GMT AceRimmer

At what point

is it cheaper and simpler to build a jet engine rather than build a simulation of one?

3 0
1. Tuesday 29th January 2013 10:40 GMT Anonymous Coward
  
  Re: At what point
  
  not at this point, clearly.
  
  Even if the simulation takes longer than a physical test would, the physical test cant tell you what is going on inside the engine at every point. And I would think that a virtual design is quicker to virtually build than whole or part of a real, physical, jet engine.
  
  0 0
2. Tuesday 29th January 2013 10:41 GMT Ken Hagan
  
  Re: At what point
  
  At what point is it cheaper and simpler to build several dozen jet engines, each to a one-off specification, rather than build a simulation of just one and tweak its parameters?
  
  "There, fixed that for you" as they say.
  
  But yes, this beastie would be fairly close to that point. I assume it can also be used for other things, and I also assume that its builders have learned a fair bit about architecture and so the next one will be cheaper.
  
  1 0
3. Tuesday 29th January 2013 17:11 GMT Canecutter
  
  Re: At what point
  
  One important advantage of constructing a simulation is that you get to validate models you might have constructed by induction from empirical methods like building (many versions of) jet engines and testing them with instruments.
  
  The benefit of having a validated, quantitative model is that you are now aware of how the various attributes and parameters of the model constrain each other, thus you are better enabled to do effective engineering. You will be better aware of the various tradeoffs and optimisations you may perform during the specification and design process.
  
  1 0
Tuesday 29th January 2013 09:24 GMT david 12

The trick is...

"The trick is getting the models to run quickly enough: and it was the search for speed that led the researchers to get to work getting their code to run across so many cores in parallel."

Well, that's one way of putting it.

Or,

"The trick is getting enough points, and it was the search for higher resolution that led the researchers to get to work getting their code to run across so many cores in parallel."

Anyone can get a model to run in minutes. The trick is getting a model that has enough resolution to tell you what you want to know.

.

Ok you can transform faster==more parallel, except you can't exactly, and now that parallel simulations run "fast enough", you aren't trying to make it faster: you're trying to get more points at the same speed.

1 0
Tuesday 29th January 2013 09:26 GMT Anonymous Coward

Why did I read that as "university’s Centre for Tumescence Research"?

1 0
1. Tuesday 29th January 2013 10:42 GMT Ken Hagan
  
  Perhaps you looked at the picture at the bottom of the article.
  
  0 0
2. Tuesday 29th January 2013 17:13 GMT Canecutter
  
  Take.
  
  Mind.
  
  Out.
  
  Of.
  
  Gutter!
  
  0 0
Tuesday 29th January 2013 09:42 GMT John Smith 19

So it's partly the communcations overhead and the synchronosation that's the problem.

I seem to recall a system that decoupled inter processor comms from the the data/instruction bus while incorporating a simple hardware scheduler.

But that was a long time ago.

0 1
1. Tuesday 29th January 2013 10:27 GMT Rob Carriere
  
  Re: So it's partly the communcations overhead and the synchronosation that's the problem.
  
  Sure, you can do that (and I expect they have done just that). But, if your calculation cannot continue until you have received the new input values from the neighboring CPU, then it doesn't matter if the hardware is decoupled, your software will stall.
  
  So, what you need is not just well-balanced hardware, but well-balanced software that plays precisely to the strengths and weaknesses of that hardware. That's tough.
  
  1 0
2. Tuesday 29th January 2013 17:45 GMT Mark Honman
  
  Re: So it's partly the communcations overhead and the synchronosation that's the problem.
  
  You're thinking of Transputers. Which were certainly the right idea for CFD, not so much on account of the architecture but because of the good balance between compute and communication speeds, and especially the very low communication latency. Low latency meant that relatively little time was wasted hanging around at the end of an iteration, waiting for data to arrive from neighbouring processors. But they had their problems too - especially absence of any kind of global/broadcast communication.
  
  I don't know how things have changed since those days, but at that time there was a tension between algorithmic efficiency and parallel processing - the more efficient CFD algorithms coupled cells across the whole domain, which was generally OK to parallelise on a shared-memory system but was no-go on a distributed-memory architecture where less efficient algorithms could be used.
  
  So... real kudos to the guys & gals, both system architects and software developers, who have pulled off the feat of building a system and a real-world application that scale across 1^^6 processing elements.
  
  0 0
Tuesday 29th January 2013 10:15 GMT LawLessLessLaw

This release makes it sound like Stanford engineered it

The 1,572,864 core computer is built by IBM and runs Linux.

http://en.wikipedia.org/wiki/IBM_Sequoia

I don't know what record they are claiming to have achieved with 1mill cores.

0 0
1. Tuesday 29th January 2013 10:53 GMT James Hughes 1
  
  Re: This release makes it sound like Stanford engineered it
  
  I believe, having actually read the article, that this is the first time they have used over a million cores in the same calculation. Usually they are spread over multiple jobs.
  
  0 0
Tuesday 29th January 2013 14:02 GMT Richard 120

Is it hard to do that?

Could you not do that just by writing some bad code?

0 0
1. Tuesday 29th January 2013 14:03 GMT Chemist
  
  Re: Is it hard to do that?
  
  "Could you not do that just by writing some bad code?"
  
  You'd better put more paper in the printer if you're making a million copies
  
  0 0
Wednesday 30th January 2013 19:23 GMT Naughtyhorse

no biggie!

I am reliably informed that the new 128GB iPad can do it faster

0 0