Google crafts custom networking CPU with parallel computing links

Tuesday 9th February 2016 20:13 GMT Anonymous Coward

Sounds like a networking chip - which I suppose would explain the name... and fit with their business. So not interesting to me.. unless it's some sort of exotic HPC interCPU bus type thingy which would at least be academically interesting.

Any experts about?

0 0 Reply
1. Tuesday 9th February 2016 21:24 GMT missingegg
  
  Could be similar to Azul's Vega chips
  
  Hard to say much without more detail, but Azul Systems very successfully built massively parallel compute appliances last decade. They put 54 CPUs on a die, and 16 fully meshed chips per server, for 864 CPUs in a flat memory space machine. And that was with floating point units and 64 bit support. Most of the transistor budget in such a chip goes into various levels of cache, but I'd expect an integer only 32 bit chip would be somewhat more dense.
  
  Azul still makes a very nice JVM, but now they're focused on Linux/x86.
  
  0 0 Reply
  1. Wednesday 10th February 2016 11:02 GMT Roo
    
    Re: Could be similar to Azul's Vega chips
    
    "Azul Systems very successfully built massively parallel compute appliances last decade. They put 54 CPUs on a die, and 16 fully meshed chips per server, for 864 CPUs in a flat memory space machine. And that was with floating point units and 64 bit support."
    
    I was on the receiving end of an Azul sales pitch - they were pitching their boxes to run our pricing models, but they refused to share any kind of floating point benchmarks with us. They also refused to allow us to benchmark the pricing models in a PoC because they would make heavy use of floating point.
    
    I was left with the impression Azul were looking for customers who wanted to buy a box to look at rather than run software.
    
    3 0 Reply
2. Wednesday 10th February 2016 10:52 GMT Roo
  
  "Any experts about?"
  
  I looked at the book referenced by the article:
  
  "Google software engineer Jacques Pienaar said the blueprints for Lanai were derived from the textbook Parallel Computer Architecture: A Hardware / Software Approach"
  
  That book describes a "Lanai NIC" built for Myricom's networks, at the heart of which is a processor that would fit the description given in the article. The processor in the Myricom NIC is hooked up to SBUS and got slotted into UltraSPARCs (pretty similar to the Meiko boxes). The "comms" co-processor architecture has been very common for many decades now - there are tons of such processors. :)
  
  The benefit of using something like an ancient Myricom processor might be binning all the hardware and software cruft associated with Ethernet & IP - which introduces a ton of overhead + latency to support features that just aren't necessary for tightly coupled networks of processors (eg: long cable runs, speed negotiation, IP's checksumming etc).
  
  3 0 Reply
  1. Wednesday 10th February 2016 10:59 GMT ToddR
    
    @Roo
    
    The card being used by Google will not have an SBus interface, which died out about >10 years ago with the last Sparc workstations and didn't see the light of day on any x86 platform.
    
    Also Meiko, (from memory), was a SIMD massively parallel platform developed in Bristol. UltraSPARC was a general purpose Unix workstation using a SINGLE Sparc processor.
    
    0 0 Reply
    1. Wednesday 10th February 2016 11:20 GMT Roo
      
      "The card being used by Google will not have an SBus interface, which died out about >10 years ago with the last Sparc workstations and didn't see the light of day on any x86 platform."
      
      I agree, that seems likely, but you never know, old I/O buses do linger on in dark corners... ;)
      
      The Meiko boxes I saw (at Meiko, they were within a short walk of INMOS where I worked) were definitely capable of MIMD operation from the hardware perspective albeit with a SIMD component offered by the vector co-processors. That said I didn't *use* the Meiko boxes, the dev-tools may well have been geared towards a SIMD programming model for all I know.
      
      0 0 Reply
    2. Wednesday 10th February 2016 13:52 GMT /dev/null
      
      Meiko never built any SIMD machines, AFAIR. Their first generation Computing Surface was based around transputers, later supplemented by SPARC and i860 processors using the transputers as an interprocessor network. They then binned the transputers in their CS-2 architecture, which used SuperSPARC/hyperSPARC processors connected via their home-grown Elite/Elan comms fabric chips instead. When Meiko fizzled out, Elite/Elan was bought by Quadrics and became QsNet. None of this was directly related to Myrinet AFAIK, apart from being a competing technology at around the same time.
      
      0 0 Reply
3. Wednesday 10th February 2016 10:55 GMT ToddR
  
  It's a low latency network chip. Myricom were one of the first, (along with Dolphin's SCI), to design low latency NICs for MPI distributed memory applications, i.e. CFD, Computational Chemistry, materials science.
  
  Not similar to Azul's SMP boxes to run Java, i'm afraid
  
  2 0 Reply
Tuesday 9th February 2016 20:32 GMT EvaQ

A bit like a ... 386?

Is this Lanai a bit like a ... 386? I wonder what Google will do with it.

2 2 Reply
1. Tuesday 9th February 2016 21:02 GMT Kevin McMurtrie
  
  Re: A bit like a ... 386?
  
  If it's a very simple architecture with no legacy baggage, they're probably tiny enough to be crammed into computers by the thousands. Most well defined tasks with well defined inputs can be implemented within a crude instruction set. Coding becomes difficult but hardware costs more than people when you're approaching a global scale of data processing. Google also believes (sometimes incorrectly) that they can do anything better than the rest of the world so it's no surprise that they'd keep building more hardware for themselves.
  
  1 0 Reply
  1. Wednesday 10th February 2016 09:36 GMT naive
    
    Re: A bit like a ... 386?
    
    Great someone is actually investing in processor technology again.
    
    Since it is google, it will be a benefit to innovation for this technology too, since the rest threw the towel in the ring years ago since cpu's are perceived as commodity produced by Intel.
    
    This taking is account that ARM is not ready (yet) for the datacenter.
    
    0 2 Reply
  2. Wednesday 10th February 2016 11:03 GMT ToddR
    
    Re: A bit like a ... 386?
    
    It will not be a cheap CPU for massive parallel processing, that would be an ARM of some description
    
    1 0 Reply
  3. Wednesday 10th February 2016 13:53 GMT kventin
    
    Re: A bit like a ... 386?
    
    """If it's a very simple architecture with no legacy baggage, they're probably tiny enough to be crammed into computers by the thousands. Most well defined tasks with well defined inputs can be implemented within a crude instruction set."""
    
    somewhere Chuck Moore suddenly started hiccupping like crazy
    
    1 0 Reply
    1. Wednesday 10th February 2016 17:58 GMT Anonymous Coward
      
      Re: A bit like a ... 386?
      
      What ever happened to Chuck Moore's low power parallel processors? Think company was named GreenArrays. You would think there would be use cases for all the IoT use cases such as low power sensors.
      
      0 0 Reply
      1. Thursday 11th February 2016 18:07 GMT Vic
        
        Re: A bit like a ... 386?
        
        What ever happened to Chuck Moore's low power parallel processors?
        
        Several things happened.
        
        Chuck uses his own layout tools, which produce very small dice for the amount of compute they provide - but the yield is far smaller than is commercially viable. As a result, although the engineering samples were fairly slappy, they would need a total re-design for mass production - and that idea didn't go down too well.
        
        There was also a change in the way patent fees were allocated in the US; the parent company was funding the venture on the expectation of other patents deriving cash based on the value of the complete item in which the patent was used. When they suddenly had to confine themselves to a portion of the balue of the individual component, that left the cashflow a little bit sticky.
        
        And then there were certain interpersonal issues. The less said about those, the better.
        
        Vic.
        
        0 0 Reply
        
        Monday 15th February 2016 02:14 GMT kventin
        
        Re: A bit like a ... 386?
        
        wow. first news of any kind in ... what? something like 2 years?
        
        i knew about the tools (okad, right?), not about the yield problem. the redesign idea didn't go down well with mr. moore i presume. the patent fees -- wasn't there a lawsuit of some kind? and i/p issues... i won't ask about.
        
        however what i would ask is: what _is_ happening (if anything) now? if you could shed some light.
        
        thanks for the info anyhow.
        
        0 0 Reply
2. Tuesday 9th February 2016 23:46 GMT martinusher
  
  Re: A bit like a ... 386?
  
  This part could describe any number of RISC machines but not an Intel processor. Intel x86 parts are the epitome of a CISC -- they're heavily microcoded and manage all sorts of instruction weirdness such as pre-bytes and variable instruction widths. (Something to do with being backwards compatible to the 1970s.)
  
  Intel instructions typically work between a single register and memory. This, being a RISC, will move data between registers, something like result register is made from an operation involving two source registers. Memory access will require an address in a register with another register or part of the instruction providing an offset. They've got other properties like really,really, disliking accessing on non-machine word boundaries but all this is invisible to most programmers.
  
  This is what makes the Intel part such an amazing deal. The x86 might be an architectural nightmare but its a highly developed, extremely sophisticated, architectural nightmare. To an end user "it just works" -- you get all sorts of crap included with the processor such as floating point and other dedicated instructions, memory management, virtual memory management, caches, all sorts of things thrown in with the part. In the embedded world (where space and low power is more of a premium) all this stuff is extra.
  
  5 1 Reply
Tuesday 9th February 2016 21:50 GMT Woza

"There is no floating-point support, so it won't be juggling tasks involving lots of math."

I guess number theory doesn't qualify as maths, then?

6 0 Reply
1. Wednesday 10th February 2016 02:36 GMT Anonymous Coward
  
  "number theory"
  
  '32-bit modulo' number theory?
  
  0 0 Reply
  1. Wednesday 10th February 2016 13:47 GMT kventin
    
    Re: "number theory"
    
    4294967296 ought to be enough for almost anybody
    
    2 0 Reply
    1. Thursday 11th February 2016 18:09 GMT Vic
      
      Re: "number theory"
      
      4294967296 ought to be enough for almost anybody
      
      Yeah? Try expressing that in 32 bits...
      
      Vic.
      
      0 0 Reply
Tuesday 9th February 2016 22:55 GMT JeffyPoooh

Very RISC... Like heading back in time.

Eventually they'll announce a new-fangled 32-bit wide NAND gate with some tunable delay lines. The compiler will convert your code into the required tuning of the delay lines.

Hmmm... must buy some mercury futures.

2 0 Reply
Tuesday 9th February 2016 23:41 GMT Anonymous Coward

Existing (old) 32bit processor already supported by GCC and GDB

Looks like https://www.myricom.com/ also already has a full compiler port https://github.com/myri/lanai-gcc plus a debugger https://github.com/myri/lanai-gdb.

0 0 Reply
Wednesday 10th February 2016 13:22 GMT Anonymous Coward

Fixed value registers

> including: two fixed value registers (one probably being zero)

Two and five, of course. :-)

Either that or one is the binary for 4.4.4.4 and the other is ~~8.8.8.8~~ the NSA's slurp address.

2 1 Reply
Wednesday 10th February 2016 17:57 GMT FormerMyriNut

Lanai was the internal processor family name, also the founders favorite island near Maui

The Lanai core is a packet processing engine, not a generic x86/MIPS/ARM core. It was designed to hump packets around really fast from the wire to the computer bus. Originally for Myrinet 640Mb/s then 1280, 2G and finally 10G. CSPi still sells the 10G single core version of this chip today in it's 8B and 8C series cards.

By definition a packet processing CPU is designed to do basic packet manipulations, in this case specifically for the HPC market which today is really just mainstream computing. What was interesting about the Lanai architecture is that it was designed to support multiple 10/40GbE PHYs (these are the PHYsical network links), up to 10 packet processors and a pair of PCIe ports, for a total of 16 devices (think of them in a stack). Then on one side of this stack there is a shared memory bus linking all these devices via many ports to many banks of on chip sram memory. On the other side of this stack was a command/control bus with an ultra low latency HPC like cross bar switch linking these devices. Mind you this was framed out in 2008 and the designed was pretty much locked down in 2010 so we're talking about the implementation Chuck had in mind eight years ago. Which even by today's standards is pretty cool. The above shared Lanai details are now another three years older, so they've likely been improved upon further.

1 0 Reply
Thursday 11th February 2016 07:18 GMT allthecoolshortnamesweretaken

So, are they building Skynet or not?

0 0 Reply