back to article Google crafts custom networking CPU with parallel computing links

It appears Google has quietly built an in-house processor with close ties to parallel computing and networking. Evidence of the CPU, destined for internal use only, emerged today in source code patches for the LLVM C/C++ compiler, allowing programmers to produce executables for the hardware. Not that you can get your hands on …

  1. Anonymous Coward
    Paris Hilton

    Sounds like a networking chip - which I suppose would explain the name... and fit with their business. So not interesting to me.. unless it's some sort of exotic HPC interCPU bus type thingy which would at least be academically interesting.

    Any experts about?

    1. missingegg

      Could be similar to Azul's Vega chips

      Hard to say much without more detail, but Azul Systems very successfully built massively parallel compute appliances last decade. They put 54 CPUs on a die, and 16 fully meshed chips per server, for 864 CPUs in a flat memory space machine. And that was with floating point units and 64 bit support. Most of the transistor budget in such a chip goes into various levels of cache, but I'd expect an integer only 32 bit chip would be somewhat more dense.

      Azul still makes a very nice JVM, but now they're focused on Linux/x86.

      1. Roo
        Windows

        Re: Could be similar to Azul's Vega chips

        "Azul Systems very successfully built massively parallel compute appliances last decade. They put 54 CPUs on a die, and 16 fully meshed chips per server, for 864 CPUs in a flat memory space machine. And that was with floating point units and 64 bit support."

        I was on the receiving end of an Azul sales pitch - they were pitching their boxes to run our pricing models, but they refused to share any kind of floating point benchmarks with us. They also refused to allow us to benchmark the pricing models in a PoC because they would make heavy use of floating point.

        I was left with the impression Azul were looking for customers who wanted to buy a box to look at rather than run software.

    2. Roo
      Windows

      "Any experts about?"

      I looked at the book referenced by the article:

      "Google software engineer Jacques Pienaar said the blueprints for Lanai were derived from the textbook Parallel Computer Architecture: A Hardware / Software Approach"

      That book describes a "Lanai NIC" built for Myricom's networks, at the heart of which is a processor that would fit the description given in the article. The processor in the Myricom NIC is hooked up to SBUS and got slotted into UltraSPARCs (pretty similar to the Meiko boxes). The "comms" co-processor architecture has been very common for many decades now - there are tons of such processors. :)

      The benefit of using something like an ancient Myricom processor might be binning all the hardware and software cruft associated with Ethernet & IP - which introduces a ton of overhead + latency to support features that just aren't necessary for tightly coupled networks of processors (eg: long cable runs, speed negotiation, IP's checksumming etc).

      1. ToddR

        @Roo

        The card being used by Google will not have an SBus interface, which died out about >10 years ago with the last Sparc workstations and didn't see the light of day on any x86 platform.

        Also Meiko, (from memory), was a SIMD massively parallel platform developed in Bristol. UltraSPARC was a general purpose Unix workstation using a SINGLE Sparc processor.

        1. Roo

          "The card being used by Google will not have an SBus interface, which died out about >10 years ago with the last Sparc workstations and didn't see the light of day on any x86 platform."

          I agree, that seems likely, but you never know, old I/O buses do linger on in dark corners... ;)

          The Meiko boxes I saw (at Meiko, they were within a short walk of INMOS where I worked) were definitely capable of MIMD operation from the hardware perspective albeit with a SIMD component offered by the vector co-processors. That said I didn't *use* the Meiko boxes, the dev-tools may well have been geared towards a SIMD programming model for all I know.

        2. /dev/null

          Meiko never built any SIMD machines, AFAIR. Their first generation Computing Surface was based around transputers, later supplemented by SPARC and i860 processors using the transputers as an interprocessor network. They then binned the transputers in their CS-2 architecture, which used SuperSPARC/hyperSPARC processors connected via their home-grown Elite/Elan comms fabric chips instead. When Meiko fizzled out, Elite/Elan was bought by Quadrics and became QsNet. None of this was directly related to Myrinet AFAIK, apart from being a competing technology at around the same time.

    3. ToddR

      It's a low latency network chip. Myricom were one of the first, (along with Dolphin's SCI), to design low latency NICs for MPI distributed memory applications, i.e. CFD, Computational Chemistry, materials science.

      Not similar to Azul's SMP boxes to run Java, i'm afraid

  2. EvaQ

    A bit like a ... 386?

    Is this Lanai a bit like a ... 386? I wonder what Google will do with it.

    1. Kevin McMurtrie Silver badge

      Re: A bit like a ... 386?

      If it's a very simple architecture with no legacy baggage, they're probably tiny enough to be crammed into computers by the thousands. Most well defined tasks with well defined inputs can be implemented within a crude instruction set. Coding becomes difficult but hardware costs more than people when you're approaching a global scale of data processing. Google also believes (sometimes incorrectly) that they can do anything better than the rest of the world so it's no surprise that they'd keep building more hardware for themselves.

      1. naive

        Re: A bit like a ... 386?

        Great someone is actually investing in processor technology again.

        Since it is google, it will be a benefit to innovation for this technology too, since the rest threw the towel in the ring years ago since cpu's are perceived as commodity produced by Intel.

        This taking is account that ARM is not ready (yet) for the datacenter.

      2. ToddR

        Re: A bit like a ... 386?

        It will not be a cheap CPU for massive parallel processing, that would be an ARM of some description

      3. kventin

        Re: A bit like a ... 386?

        """If it's a very simple architecture with no legacy baggage, they're probably tiny enough to be crammed into computers by the thousands. Most well defined tasks with well defined inputs can be implemented within a crude instruction set."""

        somewhere Chuck Moore suddenly started hiccupping like crazy

        1. Anonymous Coward
          Anonymous Coward

          Re: A bit like a ... 386?

          What ever happened to Chuck Moore's low power parallel processors? Think company was named GreenArrays. You would think there would be use cases for all the IoT use cases such as low power sensors.

          1. Vic

            Re: A bit like a ... 386?

            What ever happened to Chuck Moore's low power parallel processors?

            Several things happened.

            Chuck uses his own layout tools, which produce very small dice for the amount of compute they provide - but the yield is far smaller than is commercially viable. As a result, although the engineering samples were fairly slappy, they would need a total re-design for mass production - and that idea didn't go down too well.

            There was also a change in the way patent fees were allocated in the US; the parent company was funding the venture on the expectation of other patents deriving cash based on the value of the complete item in which the patent was used. When they suddenly had to confine themselves to a portion of the balue of the individual component, that left the cashflow a little bit sticky.

            And then there were certain interpersonal issues. The less said about those, the better.

            Vic.

            1. kventin

              Re: A bit like a ... 386?

              wow. first news of any kind in ... what? something like 2 years?

              i knew about the tools (okad, right?), not about the yield problem. the redesign idea didn't go down well with mr. moore i presume. the patent fees -- wasn't there a lawsuit of some kind? and i/p issues... i won't ask about.

              however what i would ask is: what _is_ happening (if anything) now? if you could shed some light.

              thanks for the info anyhow.

    2. martinusher Silver badge

      Re: A bit like a ... 386?

      This part could describe any number of RISC machines but not an Intel processor. Intel x86 parts are the epitome of a CISC -- they're heavily microcoded and manage all sorts of instruction weirdness such as pre-bytes and variable instruction widths. (Something to do with being backwards compatible to the 1970s.)

      Intel instructions typically work between a single register and memory. This, being a RISC, will move data between registers, something like result register is made from an operation involving two source registers. Memory access will require an address in a register with another register or part of the instruction providing an offset. They've got other properties like really,really, disliking accessing on non-machine word boundaries but all this is invisible to most programmers.

      This is what makes the Intel part such an amazing deal. The x86 might be an architectural nightmare but its a highly developed, extremely sophisticated, architectural nightmare. To an end user "it just works" -- you get all sorts of crap included with the processor such as floating point and other dedicated instructions, memory management, virtual memory management, caches, all sorts of things thrown in with the part. In the embedded world (where space and low power is more of a premium) all this stuff is extra.

  3. Woza
    Joke

    "There is no floating-point support, so it won't be juggling tasks involving lots of math."

    I guess number theory doesn't qualify as maths, then?

    1. Anonymous Coward
      Anonymous Coward

      "number theory"

      '32-bit modulo' number theory?

      1. kventin

        Re: "number theory"

        4294967296 ought to be enough for almost anybody

        1. Vic

          Re: "number theory"

          4294967296 ought to be enough for almost anybody

          Yeah? Try expressing that in 32 bits...

          Vic.

  4. JeffyPoooh
    Pint

    Very RISC... Like heading back in time.

    Eventually they'll announce a new-fangled 32-bit wide NAND gate with some tunable delay lines. The compiler will convert your code into the required tuning of the delay lines.

    Hmmm... must buy some mercury futures.

  5. Anonymous Coward
    Anonymous Coward

    Existing (old) 32bit processor already supported by GCC and GDB

    Looks like https://www.myricom.com/ also already has a full compiler port https://github.com/myri/lanai-gcc plus a debugger https://github.com/myri/lanai-gdb.

  6. Anonymous Coward
    Joke

    Fixed value registers

    > including: two fixed value registers (one probably being zero)

    Two and five, of course. :-)

    Either that or one is the binary for 4.4.4.4 and the other is 8.8.8.8 the NSA's slurp address.

  7. FormerMyriNut

    Lanai was the internal processor family name, also the founders favorite island near Maui

    The Lanai core is a packet processing engine, not a generic x86/MIPS/ARM core. It was designed to hump packets around really fast from the wire to the computer bus. Originally for Myrinet 640Mb/s then 1280, 2G and finally 10G. CSPi still sells the 10G single core version of this chip today in it's 8B and 8C series cards.

    By definition a packet processing CPU is designed to do basic packet manipulations, in this case specifically for the HPC market which today is really just mainstream computing. What was interesting about the Lanai architecture is that it was designed to support multiple 10/40GbE PHYs (these are the PHYsical network links), up to 10 packet processors and a pair of PCIe ports, for a total of 16 devices (think of them in a stack). Then on one side of this stack there is a shared memory bus linking all these devices via many ports to many banks of on chip sram memory. On the other side of this stack was a command/control bus with an ultra low latency HPC like cross bar switch linking these devices. Mind you this was framed out in 2008 and the designed was pretty much locked down in 2010 so we're talking about the implementation Chuck had in mind eight years ago. Which even by today's standards is pretty cool. The above shared Lanai details are now another three years older, so they've likely been improved upon further.

  8. allthecoolshortnamesweretaken

    So, are they building Skynet or not?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon