back to article Pass the 'Milk' to make code run four times faster, say MIT boffins

MIT boffins have created a new programming language called “Milk” that they say runs code four times faster than rivals. Professor Saman Amarasinghe says the language's secret is that changes the way cores collect and cache data. Today, he says, cores will fetch whole blocks of data from memory. That's not efficient when …

  1. tojb
    Happy

    Sounds great call me when its in the OpenMP standard

    This seems to address the big problem with OpenMP, which is that you can easily end up getting a net slowdown due to cores fighting over what is in the cache. If this is at all effective then the extra pragmas will/should get rolled into the standard very quickly.

    1. Anonymous Coward
      Anonymous Coward

      Re: Sounds great call me when its in the OpenMP standard

      I'll put a note in my dairy.

      1. a pressbutton
        Pint

        Re: Sounds great call me when its in the OpenMP standard

        joke of the day.

        have a beer ac

        1. m0rt

          Re: Sounds great call me when its in the OpenMP standard

          "I'll put a note in my dairy."

          Make it a big note, otherwise it will be past-your-eyes before you see it...

    2. The Man Who Fell To Earth Silver badge
      WTF?

      Re: Sounds great call me when its in the OpenMP standard

      So, as I read the article, it's vaporware at this stage.

  2. leon clarke

    Interesting

    To save others googling for more info, here's the article that the news sites are ripping off http://news.mit.edu/2016/faster-parallel-computing-big-data-0913

    The 'good for big data' seems very significant - it's solving problems that happen with massive data sets so don't hope that this'll be eventually applied to small data to make your PC 4x faster

    1. Version 1.0 Silver badge
      Pint

      Re: Interesting

      From the link "Milk simply adds a few commands to OpenMP, an extension of languages such as C and Fortran that makes it easier to write code for multicore processors.

      No worries then - I can continue using FORTRAN.

      1. Tchou
        Headmaster

        Re: Interesting

        "Milk simply adds a few commands to OpenMP"

        ...And the later being written in C, Milk looks more like a new memory management algorithm/method/system rather than a new language.

  3. Wiltshire

    Milk for SQL ?

    1. Anonymous Curd

      SQL is one area you wouldn't need this. Because tables are typed and statements declarative, you know exactly what size your data are and where in the data structures they live. As a result this was solved decades ago with columnar data and vectorised execution. Proper data systems don't need to muck about with cache black magic to skip over boring data - it has been re-organised on ingest to ensure it's never even read off disk.

  4. Spacedman
    Joke

    Got Link?

  5. Unep Eurobats
    Paris Hilton

    Must be an acronym

    Can't figure it out though. Multi-implemented local kernel?

    If it's not, then possibly something like Juice would have been more appropriate?

    1. Anonymous Coward
      Anonymous Coward

      Re: Must be an acronym

      “It’s as if, every time you want a spoonful of cereal, you open the fridge, open the milk carton, pour a spoonful of milk, close the carton, and put it back in the fridge,” explained Vladimir Kiriansky, a doctoral student in electrical engineering and computer science at MIT.

      1. Anonymous Coward
        Anonymous Coward

        Re: Must be an acronym

        So what you're saying is that if you use this language, your data will get soggy?

        1. Arthur the cat Silver badge
          Unhappy

          Re: Must be an acronym

          So what you're saying is that if you use this language, your data will get soggy?

          Lactose intolerant programmers will suffer from code bloat and their output will be a load of shit.

          [Yes, I am lactose intolerant. How did you guess?]

          1. Anonymous Coward
            Anonymous Coward

            Re: Must be an acronym

            A cat that is lactose intolerant?!

            Well I suppose there has to be some down side to having the ability to type.

            1. jake Silver badge

              Re: Must be an acronym

              Most cats ARE lactose intolerant. Hint: Felines aren't built to run on bovine milk. Cow milk is built for baby cows, not your mog.

              1. Alan Brown Silver badge

                Re: Must be an acronym

                "Most cats ARE lactose intolerant"

                Most PEOPLE are too.

                The only groups who (mostly) aren't are Europeans and Mongolians.

      2. Unep Eurobats
        Joke

        Re: Must be an acronym

        "...explained Vladimir Kiriansky..."

        Serves me right for not reading the article properly. I just skimmed it.

  6. Mage Silver badge

    Language feature really?

    Seems like a runtime system support feature. Good languages only have such things via library support?

    I can't see how this can be described as an intrinsic feature of ANY programming language.

    1. Falmor

      Re: Language feature really?

      Didn't sound like a language feature to me either. A compiler that knows what data will be handled at runtime would be very impressive.

      1. Destroy All Monsters Silver badge

        Re: Language feature really?

        It would probably solve the Halting Problem, too.

  7. Anonymous Coward
    Anonymous Coward

    My skateboarding sheep and pirate friend will be happy

  8. Peter Gathercole Silver badge

    Software? Or maybe hardware.

    So instead of just fetching the data, you're going to catch the request (in software?), and defer the code waiting for the data until the data can be more efficiently fetched.

    Just how many more very expensive context switches will this generate? And where are the other threads that can be dispatched once all of them are waiting for an 'efficient data fetch'. And how will that affect the latency of individual threads?

    I'm sure that there are some highly threaded applications with unpredictable data flow where this could be a benefit, but on the brute-force codes that make up most HPC applications, which mostly process data in predictable ways, especially Fortran code where the standard dictate how data is stored in arrays, this is likely to be completely unneeded extra code that can only slow the total throughput.

    I think I'll let the hardware cache pre-fetch hardware provide all the speedup most real 'Big Data' requires.

    1. Brewster's Angle Grinder Silver badge

      Re: Software? Or maybe hardware.

      Researchers: "We've developed and tested a a feature that makes these common algorithms four-times faster."

      Commentard: "I haven't a clue what they've done but obviously it'll slow things down."

      1. Peter Gathercole Silver badge

        Re: Software? Or maybe hardware. @Brewster

        OK, I'll wait for the full paper to be published, but there's a number of things in the quotes in the article that make little to no sense (although it could be that the journo writing the original article has not fully understood it).

        Let's start with "when a core discovers that it needs a piece of data, it doesn’t request it"

        Um. A core. A physical processor. How is this conditioned by the Milk compiler without wrapping the load instruction with a whole load more code, because that is all a compiler can do!

        Then we've got "adds the data item’s address to a list of locally stored addresses".

        and then what. Puts the thread to sleep? Hello. Expensive context switch. How's that going to improve latency and throughput.

        and "and redistribute them to the cores"

        The compiler can sort this out? It's going to have to know a huge amount about the shape of the system the code is going to run on before it generates the code. And most systems leave the placing of data in memory to the kernel and the hardware memory translation mechanisms. Milk's going to be able to control all of this, simply?

        And anyway, The article talks first about a new language, and then talks about modifications to OpenMP. Which is it? OpenMP is not a language in it's own right, and it does not have a compiler, it's more like a pre-processor that expands a number of in-line directives in the code into something that the following compiler (Fortran or C) can generate linkable code.

        I don't know whether you've ever used OpenMP, but it already does significant data reduction. It sounds like this "Milk" language is merely a modification to OpenMP, and not a language in it's own right.

        Ah. I found the original MIT release. One of the things it says is "manage memory more efficiently in programs that deal with scattered data points in large data sets". It also says nothing about "common algorithms". It talks about a rather generalized, sparse data problem that is not actually really suited to the type of computing tha HPC systems and OpenMP are really suited to.

        But I'll look for the full presentation, but I doubt that it will claim the type of panacea that the Register article claims.

        1. Geoffrey W

          Re: Software? Or maybe hardware. @Brewster

          Commentard...Begins deepening and widening hole.

          1. Antidisestablishmentarianist

            Re: Software? Or maybe hardware. @Brewster

            I'll have to join him in the hole then. His line of thinking/questioning seems sound to me. Maybe the milk isn't sour, but the article reporting on it certainly has a bit of a whiff to it.

        2. Brewster's Angle Grinder Silver badge

          Re: Software? Or maybe hardware. @Brewster

          > It also says nothing about "common algorithms".

          *cough* Press release, fourth paragraph, first sentence: "In tests on several common algorithms, programs written in the new language were four times as fast as those written in existing languages." *cough*

          You engage in some nit picking about the dumbing down. I have no problem with that, although, to be fair to El Reg, it's right there in the press release. And your final two sentences are fair comment; in fact your penultimate one probably nails what's going on here: they've expanded the range of situations in which OpenMP can sensibly be used. But challenging the veracity of the underlying research went beyond cutting down sensationalist hypebole: I think we can reasonably expect the researchers to have done what they've said they've done1, even if they've magnified its significance.

          1. Okay, there was that whole BICEP2 thing. And those superluminal neutrinos. And......

          1. Peter Gathercole Silver badge

            Re: Software? Or maybe hardware. @Brewster

            Fair enough. I missed the line on "common algorithms" in my own reference. No excuses there, but, again, I will wait for the final paper to see which algorithms these are.

            I've worked with people doing serious work with HPC systems, and provided technical support for those systems. They put a lot of effort into trying to make sure that data is already in an appropriate place before it is needed. Generally this is done by localizing data in chunks where much of the required data is close to other related data, in blocks aligned to a common block sizes. They try to make sure that data is used such that it can be fetched in regular ways so that the pre-fetch and cache hardware will have it lined up for when it is needed. They make sure that the minimum amount of data is requested between cores and systems over the interlink.

            When you are iterating over a loop many millions of times, saving a few instruction per iteration, and avoiding cache misses and context switches can save you a huge amount of resource.

            Whilst I can see that there is the possibility that Milk could make savings for some particular types of problem, all it is really doing is eliminating efficiency problems from non-optimized code. It's no substitute for experienced programmers, but held out as a carrot for organizations wanting to reduce the skill set of their programmers. IMHO, having worked with some very skilled programmers, it will take a long time before this can be realistically achieved.

            It will be very interesting to see whether adding Milk to the Unified Model for weather forecasting will result in faster code.

            1. Brewster's Angle Grinder Silver badge

              Re: Software? Or maybe hardware. @Brewster

              I started out writing assembly, and it was very easy to write code that was smaller and faster than a compiler. These days, not so much. This sounds like a step down that road for parallel programming. (And people said much the same about experienced programmers then. In fact, I think I said it...) Yes, it does sound like it's automatically doing things an experienced programmer would do. No, the elite HPC guys may not see the performance gain advertised. But HPC aren't the only users of OpenMP and in the world of Big Data I don't imagine they have time to optimise code to that level.

              In time, it will probably match what the HPC guys. And there are benefits in code maintainability and the time it takes to write this code in having it done automatically.

        3. Frumious Bandersnatch

          OpenMP ... does not have a compiler

          er, mpicc?

          I wish that I could say that IKWYM, but then again, the same comment can be levelled at the author of the article. I didn't know that the Goss brothers got back together. (Oh wait... that was "Bros", not "dross". Carry on).

          1. Peter Gathercole Silver badge

            Re: OpenMP ... does not have a compiler @Frumious

            Mpicc is a wrapper around various C compilers, and is normally used with gcc.

            It's really not a compiler in it's own right, more like a pre-processor.

            The MPI component will take a number of inline directives, and generate some C, unroll some loops into parallel threads, do map reduction and some other optimisations, and add some glue code and hooks to library routines to handle passing data between local and remote threads.

            Having done this, it then passes the resultant intermediate source to the backend (real) compiler which generates the linkable code which is then passed to the linker to resolve all the library calls.

            If you remember, the earliest C++ 'compilers' worked in exactly the same way as a pre-processor to a C compiler, but I would suggest that C++ is more of a complete language than OpenMP.

  9. pangu

    ...or CAFS ?

  10. Alan Bourke

    Lovely.

    Milky milky.

  11. keith_w

    So this means that Fortran and C are going to go the milky way?

    1. Brewster's Angle Grinder Silver badge

      Mmm, chocolate.

  12. hellwig

    I have no knowledge...

    When it comes to expansive multi processing, I'm less than a novice, but what good does it do for a process to record the address it needs instead of requesting the data? Doesn't it need the data to continue it's processing? Now the compiler has to re-order the instructions to request all data first, then retrieve that data, THEN execute the computations on said data?

    At some point, are we just asking too much of the compiler? Remember when Itanium was going to solve all our problems, only that pesky compiler wouldn't play along?

    1. Filippo Silver badge

      Re: I have no knowledge...

      That's all true in the general case, however there are particular cases where you have a lot of threads that spend most of their time waiting on data; think massively parallel algorithms where simple operations are done on large data sets that don't fit in neat blocks. In those cases, if you can engineer a situation where some threads have to wait a bit more, but many other threads have to wait a lot less, you are gaining.

  13. OhDearHimAgain

    Sounds a lot like SCSI queue re-ordering to me.

    Also, if the program needs the data before it can proceed, then what does it do before the cache has fetched it? Presumably blocks. OR do you have to register all your data requirements from the start? Not commonly easy to do.

    That said, I've often thought that programmers write code in the order that make sense to them, but often the code can be completely re-ordered and still work the same, but I assumed modern compilers & processors also know this and do it already.

    1. Anonymous Coward
      Anonymous Coward

      Sounds a lot like SCSI queue re-ordering to me.

      At first you didn't make sense, but if you change to SCI (IEEE standard ccNUMA implemetation), yes, it does. Applied at the core level. It has a lot of potential here but I'm the crazy guy that models systems for entertainment.

  14. martinusher Silver badge

    ..and the difference between a language and a library is.....

    A lot of software problems can be blamed on people not knowing the difference between languages, libraries, operating system components and so on.

  15. LaeMing
    Happy

    I'll just leave this here

    https://www.youtube.com/watch?v=5e-re0Oti3A

    (Sorry, the narrator is a bit on the dim side at times).

  16. Anonymous Coward
    Anonymous Coward

    "Milk" language

    So it gets your code almost compiled, then stops right at the last second, waits a bit, then starts compiling again. Over and over and over. Is the purpose to fully compile, or delay the linker as long as possible? Hard to tell, as after a while your makefile is all smushy and disgusting. And when it finally DOES compile, it produces a much bigger binary than normal and you end up symbol-dumping all over the place.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like