The Defense Advanced Research Projects Agency - the research and development arm of the US military - has issued a challenge to nerds, geeks, techies, and boffins to bid on how they would build a petaflops-scale system of nearly unimaginable energy efficiency and compactness. Oh, and the prospective system also needs to be …
all in one rack huh?
simple... make the rack as wide as a container is long and stuff it full of kit every 19 inches to the roof.
sorted... where's my money.
coat on, leaving
... and it should make tea.
I thought Google already did this? Qaudzilion cpu's cramed into a container using almost now power, cooled by the wind and able to switch between continents in a pico-second programable using google apps.
Tongue out of cheek - it would be interesting to see if the googles of this world will spin off small companies to take part in this - after all they are not exactly doing ERP in their data centres - its just moving blocks of data back and forth. And a small peta flop array will be exactly what they need.
Do no evil - BUWHAHAHAH!!!
PS: Larry could do a bit of scratching in the old Sun toolbox in the garage, pick out a few bits and bobs and come up with a solution, can't he?
DARPA Just So...
Might have to interface my doggie to make a computer that meets the needs.
My dog uses aboit 25 watts! a human uses 150 watts... This is well known as it is a calculation used in the Air Conditioning industry.
They never sent me the million bucks for solving the Eschelon problem, so ill just shut up now.
and drink a 211 while i wait for the money.
Ought to be doable
Forgetting the rest of the computer system for a moment. A deep pipeline and a slow-ish clock speed ought to mean a petaflop could be achieved with a million or so arithmetic units. Drop the requirement for IEE compatibility internally and they could be made quite small, but they would have to work within a 50mW power budget. Atmel and TI both have current lines of DSPs not too far off the mark, so given a couple of years of process improvements, this is possibly doable.
However, there'd be no room for the machinery to actually run a program as such, this would be more like a giant FPGA - lots of arithmetic units and datapaths and simple switches between them, so there would need to be a more traditional computer to program or rather bootstrap the thing. Programming it would be all about describing the data flow rather than the order of execution. I should imagine that points pretty strongly towards a functional language like Matlab or Haskell instead of an imperative one, which should make picking the parallelism out automatically much less bothersome than with current methods, (eg C-to-hardware). So the nanny computer may not even need to be a beast either. I guess the trickiest bit would be the compiler/supporting software rather than the hardware, maybe a few hundred man-years of work right there.
57 Kw for just one DC tile?
Our guys planning DC space would love to have a 57 Kw energy budget for each rack, even inclusive of cooling requirements. It's nothing like green, but it would mitigate a lot of limitations being faced today due to continuing exponential growth of storage requirements.
As for that "portable" generator, I guess DARPA's definition of portable must mean "mountable on a truck chassis". A portable genset, in the civilian sense, means "human luggable", so 5Kw is about the maximum size for one of those.
wishful thinking for the $$
Yeah. This spec and a couple of quid will get you a cup of ok coffee most places in the US.
Umm, 4,000 Power-7 chips?
Let's see, 1e15 / 250e9 = 4,000. And this is supposed to be used by imbeciles because in the military talented people are kept away from where they'd do the most good. And it would recognize that it has become a member of a botnet infection and clean house on its own.
This is about Skynet, isn't it?
No it is for the Bolo Mark1
It is obviously for the first Bolo. An autonomous tank.
It's quite simple actually
Inside the rack is a simple processor that, upon any input, prints out, "1) Iran, 2) North Korea, 3) Pakistan."
Easy : Stack up ~360 ATI 5870 GPUs
You can stack 8 GPUs to a motherboard (In the form of a standard 5890 card) - And you can stack 2 motherboards side by side in a 3U unit. The proposed rack would hold 22 3U units.
Then to make it easy to program, you write a OpenCL wrapper API for the whole cluster. - and divvy out the work through OpenCL to the individual computers - together with a slice of dataset to chew on.
It isn't quite a homerun though:
0.816 petaflops @ 66KW (1500W per machine)
It would need custom motherboards to fit in the rack, but one could start work today using off the shelf components, and by the time ATI moves one step down from 40nm, the power and size issues will take care of themselves.
Where do I collect my prize?
Errr... I can see one
but its based on NVIDIA chips.
The next gen of these http://www.ge-ip.com/products/3429 is a 6U card containing many (double figure) GPU's and 40G backplanes with 20+ slots. Make it spray-cooled (existing technology) rather than conduction cooled and Bob's your uncle for your standard building block.
You could get near the peta flops per Jeep figure (plus another Jeep with the generator)
Military Intelligence may be an oximoron, but there is cutting edge intelligence making the stuff that military might use. But as it comes from the UK then it will be ignored for any US funding!
Erm, hang on a minute you guys....
"The machine has to have a "self-aware" operating system that can learn from how it is used and adapt to the "changing goals, resources, models, operating conditions, attacks, and failures" that might happen in the field and "mitigate the effects of attacks and failures, closing exploited vulnerabilities."
Does this not sound a bit Skynet-y?
[searches to make sure there is not a Miles Dyson on the DARPA payroll...]
Never mind the weight of the 57kW generator. How heavy is this box then? 1.2 cubic meters with cooling, with the power supplies, there is hardly any room left for the processors, and it is military rugged, practically solid metal, the all up weight is well north of 2000kg.
Helicopter? Hell yes, a bloody big one.
Ok, call me back then and I'll send them a desktop....kerchiiing
A much much better spend would be to come up with more intelligent humans by 2018, multilingual, capable of remembering a lot of stuff, good at formulating really clever plans and with advanced social skills. People are already self-aware, low power, easy to move around the world at short notice. Couple them with DARPA-improved iPhones and send me my money please.
Anyone remeber the ICL bit serial array processor?
This is one of those problems that *demands* a full systems approach. It's processor + storage + cooling = 57Kw. With a regular CPU chomping through 100w a 2GFLOPS is roughly 1 order of magnitude below target and you haven't started on the storage.
if they're staying with that seal healing/self optimizing OS that's going to need some serious research. You could code up clever behavior in a normal language, but responding to the unexpected (in the *fullest* sense) brings in issues from control systems (self tuning) and potentially even how a system *evolves* internal representations of the problem IE How to allow something to find ways to think about problems its never encountered before.
The 2 obvious paths are standard parts with *very* clever packaging (stripped off casings, flip chips all round, custom boards with custom spacings, heat pipe thermal management) or custom chips. If you're already going custom why not open up the full spectrum of options. Bit serial (easier to route) clocked versus unclocked (IIRC *half* of all CPU transistors are clock drivers to distribute the now *very* high speed clocks across the chip. Burn some of the transistor budget using an asynchronous design for a *big* drop in power consumption (and hence cooling requirement.).
I'd guess what they use internally for math is completely open but IEEE754 is an interface standard. it's maths properties are (by now) well understood so everyone knows what ranges of numbers it can, and just as importantly cannot accept.
Any way this is sliced a *lot* of simulation is going to be done before anyone starts a PCB layout. I'm quite excited.
I can do it In a Single iRacK Sorry ( * Iraq *)
Now where is my Money ?