back to article MIT boffins build 36 core processor with data-traffic smarts

Researchers at MIT say they have successfully built a 36-core processor that uses an internal networking system to get maximum data throughput from all the processing cores. MIT's new multicore, multi-bus, chip MIT's new multicore, multi-bus chip The design, unveiled at the International Symposium on Computer Architecture …

COMMENTS

This topic is closed for new posts.
  1. Will Godfrey Silver badge

    I want one!

    See title :)

  2. Anonymous Coward
    Anonymous Coward

    NUMA

    Neat, but seems like "just" a generalization of the sort of point-to-point interconnect systems that AMD started doing with the original Opteron and probably other companies were doing before them.

    I definitely want to own a machine with 36 cores.

    1. Swarthy
      Thumb Up

      Re: NUMA

      I really want a 36 core chip.

      But then I would want a dual- or quad-capable Motherboard. (Octo-capable would be, perhaps, a bit much to ask.)

      1. Anonymous Coward
        Anonymous Coward

        Re: NUMA

        8x36 cores / chip, like this?

        http://www.tilera.com/sites/default/files/images/content/TILExtreme-GxDuo-PB045-01_Web.pdf

    2. fajensen

      Re: NUMA

      You can get half way there: http://shop.adapteva.com/ - 16 cores, 119 USD.

  3. Lars Silver badge
    Linux

    Superb logic

    "The blueprints for the new chip design aren't being released as yet,","The team is now adapting a version of Linux to use the new chip.", Superb logic, to hide from the vultures behind closed source you will need open source to succeed. I quite like that, but I wonder if I have ever seen it expressed this openly.

  4. HCL

    HPC Bussiness Head

    Potentially great idea. Particularly for the Xeio Phi and K40 architectures.

  5. Anonymous Coward
    Anonymous Coward

    Wouldn't a fabric interconnect be a better alternative? Then one core would have a path to all other cores and vice versa. Having multiple paths wouldn't be a requirement if every core have a fabric connection to every other core. If there was congestion, then it would be for data already going to that other core. Imagine a 10GT/s fabric where every core has access to all other cores over a 40GT/s connection. Latency would be reduced over the "routed" method MIT has.

  6. John Smith 19 Gold badge
    Unhappy

    So Amdahl's law alive and well.

    Every time I see X number of cores on a chip I think "So how many buses does this thing have to talk to the outside world?"

    You've got 36:1 bus contention to the outside world. That's a problem a)When all those 36 cores on chip L1 and L2 cache is filling up and b) When they cache miss and have to go off chip.

    Oh look they've re-discovered nearest neighbour routing as well..

    How sweet.

    Conceptually it doesn't look like the world has gotten any better since the Transputer concepts of 30 years ago.

    1. Nick Ryan Silver badge

      Re: So Amdahl's law alive and well.

      The whole story behind the Transputer is quite interesting... as well as the reasons behind its eventual failure. El Reg has a bit of it here: http://www.theregister.co.uk/2011/08/18/heroes_of_tech_david_may/

      1. John Smith 19 Gold badge
        Unhappy

        Re: So Amdahl's law alive and well.

        "The whole story behind the Transputer is quite interesting... as well as the reasons behind its eventual failure. El Reg has a bit of it here: "

        That's actually just part of the story.

        As May says in practically the first paragraph Americans are obsessed with bus transfers.

        One of the key features of the Transputer architecture was the hardware channels ran independently of the core processor, so the core did not have to keep checking for I/O. It shoved stuff out the door through DMA and got on with the rest of the program (or programs on the hardware scheduler.

        IMHO the architecture had 2 major flaws.

        1) Word size always equal to address size. So you could never have an 8 bit (smallest unit of code) but 16 bit address option (personally I think they should have gone in with some rock bottom 16 bitters with a serial processing 1b internal CPU to conserve silicon. Very slow but good for cheap development systems or budget array processing.More to the point once people saw how you could develop on 1 processor in Occam and roll out to a massive array for full speed that opens the flood gates.

        2) No MMU. You've got a high end (for the time) processor and no dynamic memory management? WTF

        Keep in mind the Transputer was basically a stack machine with 16 local registers (which still sounds like a pretty good package to me). Designing an upgraded Transputer with 8/16 and/or proper MMU (given the 30+ years of desktop Unix out there) with minimal impact on the rest of the design sounds like a challenging but doable end of course University project to me.

        1. Anonymous Coward
          Anonymous Coward

          Re: So Amdahl's law alive and well.

          Largely agree re the Transputer. Considering the technology of time though, it was a pretty good effort and if it had achieved take-up in it's intended field of GP/MPP/HCP instead of ending up in embedded, I reckon the problems and omissions, like the MMU, would have been addressed - bare in mind that an MMU for a Transputer, being designed primarily around large-scale parallelisation ideas and concepts, would be a bit trickier to design than an MMU for a Von Neumann architecture CPU.

          Interesting that no specific architecture was mentioned in the article - might give some idea of power-draw. Guess a bit of digging to find if any of the major chip companies are sponsering the work might give an answer, if they're licensing one of the common architectures.

        2. Aclassifier

          Re: So Amdahl's law alive and well.

          As you mentioned, the transputer was "an occam machine", and thus really meant to be programmed in occam. Even if we might see it differently these days, I think the rationale for not needing a MMU was the fact that it was an occam machine. And at the time (early 1980s), not many microprocessors had an MMU.

          Occam has parallel usage rules, has synchronous channels (so data flow directly from the internals of the sender process to the internal of the receiver process while they both are blocked, so it's safe), and occam does not allow aliasing, and functions are without side effects. So, in that world, I think the MMU was designed away with a too-many-years-ahead language.

          Øyvind Teig

          Trondheim Norway

          http://www.teigfam.net/oyvind/home/

  7. blofse

    My CPU is...

    ...a neural networking processor, a learning computer.

    In this case, I am sure you could modify the router to choose the routes depending on synapse weight, then the decision at to which CPU could be done depending on how often that type of command comes through the CPU.

    What the use of that would be I have no idea!

  8. Tom Samplonius

    36 core processor, and the cores are called "tiles"? I've seen this movie already. It's called Tilera and they are shipping these CPUs. Microtik uses them in routers. http://www.tilera.com/products/processors/TILE-Gx_Family

This topic is closed for new posts.

Other stories you might like