back to article HAMR time for Google's MapReduce, says not-so-startup

Like the idea of chewing on terabytes data using Google’s MapReduce but think it's too slow, too hardware-hungry and too complicated? A fledgling big-data analytics venture reckons it’s got the answer - a Hadoop programming framework built using Java it claims is 20 times faster than using ordinary Hadoop and that it claims …

  1. bharq

    Back to the drawing board

    for a new name - or are they planning to introduce this technology on harddiscs with Heat Assisted Magnetic Recording technology?

    1. PleebSmash

      Re: Back to the drawing board

      Why would they call it HAMR? That's terrible.

      That said, we can always fall back on TAMR

      1. Thesheep
        Coat

        Re: Back to the drawing board

        STOP! HAMR time!

      2. Ian Michael Gumby
        Boffin

        @PleebSmash Re: Back to the drawing board

        TAMR is already taken.

        One of Stonebraker's companies. This one out of MIT.

  2. Anonymous Coward
    Anonymous Coward

    "Flowlets, a patent-pending API set."

    Great. More "innovative" API and algorithms that's looking to get a patent.

    The longer I work in the software industry, the more I see that patented algorithms and API are all just really obvious technical engineering constructs where a lot of the times it has been done by someone somewhere way ahead of the so-called "inventor".

    If you're looking to make the most out of software, it is obvious what you need to do. It is always the same thing. Minimising locks, contention, optimising cache and architecture with the aim of reducing the number of machine cycles and codes needed to perform a task. An exercise that can be performed by ANY competent software engineer with the time to spend that knows how the hardware interacts with the software.

    Patents does nothing but restrict real commercial innovation in software, and only benefits the "first to file". How many times do we have to say it until governments eliminates software patents.

    1. ratfox
      Coffee/keyboard

      Re: "Flowlets, a patent-pending API set."

      Yeah, I threw up in my mouth a little bit.

    2. Anonymous Coward
      Anonymous Coward

      Re: "Flowlets, a patent-pending API set."

      Well said that snivelling Anonymous Coward...

  3. Ian Michael Gumby
    Boffin

    Meh!

    Can you say Spark?

  4. Randy Hudson

    Modeling

    Surely you mean modding it, as in modulo

  5. Andy 73 Silver badge

    Next please

    It's fairly clear that we can see beyond Map/Reduce to more sophisticated distributed processing. There are quite a few contenders for the next generation platform and it looks like this is yet another.

    However, I don't think we're there yet. Most options are about reducing the pain of M/R when you've got iterative jobs and more complex work flows, but a lot of arguably unnecessary pain remains. At some point I'd expect a generic way to describe such work flows to emerge and to become the de-facto standard. For now, none of the proposals are so compelling that developers are stopping coming up with proposal n+1.

    1. Anonymous Coward
      Anonymous Coward

      Re: Next please

      The standard already exists. It's Crunch. It will happily target MR2, Tez or Spark as its execution engine with a trivial change in code. No one in the Hadoop world writes plain old MapReduce any more unless they absolutely have to, and the reasons to do that are disappearing. Honestly very few developers in the serious end of the industry even use MapReduce at all now. Spark is the new standard (it's like HAMR but not patent encumbered and, you know, already in use) for everything. Pig announced a few days ago that all of its integration tests now pass for the Pig-On-Spark codebase, and Hive won't be too far behind.

      1. phil dude
        FAIL

        Re: Next please

        I saw the word "proprietary network layer" and switched off.

        Patent pending API set?? What's this a meta-patent??

        Either you show how its better or we don't care.

        P.

      2. Andy 73 Silver badge

        Re: Next please

        Crunch - why not Cascading or Pig? The point is, there are a lot of options in this space right now, most of which are moving/already run on the new execution engines. I'm glad you feel you can call the 'winner' on this, but from where I'm sitting we're back in the era of fighting over which text editor to use. It all generates plenty of work for the 'serious' end of the industry (yeah, and our production cluster is bigger than yours), but jumping from framework to framework to keep up with the latest trends is not ultimately productive.

        1. Anonymous Coward
          Anonymous Coward

          Re: Next please

          Why not Pig? Because no one knows Pig Latin. Pretty simple. It's a lovely language, nearly ideal for writing complex data pipelines, *but* it's not a language many people know. Meanwhile everyone knows Java, and it's a lot easier to plug custom code into Crunch (or, yes, Cascading) than it is Pig, because with Pig you've got to fall back to UDFs. Different tools for different jobs, really; Pig is inherently oriented towards structured data. Which is nice, but in that case it falls fowl of the fact that's it's a hundred times easier to find a good SQL developer than it is to find someone who knows Pig.

          Why not Cascading? This is more nuanced, but it comes down to two main points. I love Cascading, so this isn't a criticism of the product, but these are the facts. The first is sheer penetration. Most Hadoop users are CDH customers, and Crunch comes packaged with CDH, so it's easier to get going with it. Coupled with that is tight integration with the rest of the stack, including the Kite SDK. Together they're a joy to work with. The second is a matter of data model. Cascading, like Pig, operates on Tuples. Crunch operates on higher-level PTypes or even POJOs. This makes it much more flexible, and in practise it is easier to encapsulate the functions within your data pipeline for unit testing, verification and reuse.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like