back to article Hadoop: When grownups do open source

Hadoop is a library for writing distributed data processing programs using the MapReduce framework. It's got all the makings of a blogosphere hit: cluster computing, large datasets, parallelism, algorithms published by Google, and open source. Every four days or so, a nerd will discover Hadoop, write a “Basic MapReduce Tutorial …

COMMENTS

This topic is closed for new posts.
  1. Ian Ferguson
    Thumb Up

    Examples?

    Any examples of people actually putting Hardoop to use? Not only would it be interesting, but it would also prove your point that Hardoop is actually *useful*; if nobody uses it for anything important, then who cares how well crafted it is, it is just as pointless as the Twitter clusterfuck.

    Otherwise, great article.

  2. DavCrav
    Thumb Up

    Class.

    "Along with the data processing framework, Doug Cutting also included a fault tolerant, replicated, distributed file system with Hadoop just because fuck you."

    If I were drinking coffee right then, you'd owe me a new laptop.

  3. Nano nano

    Journalistic niceties

    Do you get paid extra for writing articles that swear a lot ?

    Or does it just mean your vocabulary is wanting ?

  4. furby_singh
    Go

    I love these articles

    Go anger Geek! GO!

  5. Anonymous Coward
    Anonymous Coward

    _usually_

    Usually I dislike any swearing and insults people use to make a point, since normal words that have *meaning* should be able to do the job. You're excused though, since you make excellent points along the way. A lot of open source is horrible, I can't deny - although that does not mean it has no value.

    Perhaps the issue with "common" open source is that most projects are not created in tight(ly coordinated) cooperation between a few skilled individuals, but by, uh, often lesser men - partly hobbyists - in a more loose effort.

    For the record - I'm litte more than a hobbyist coder myself.

  6. Brian
    Thumb Up

    Gotta love...

    Just gotta love this guy's prose style - he says offensively in one sentence what would take three or four polite sentences.

    More, please, from Ted!

  7. Ashlee Vance (Written by Reg staff)

    Re: Journalistic niceties

    I'll use this comment as a jumping off point for a note on Ted's stories.

    Lots of you are griping about his language. I'm rejecting almost all of those comments. Save yourself some time and write something constructive. It might get published.

    Please don't torture yourself by reading the stories if "bad" language offends you. Otherwise, just relax and enjoy the ride.

  8. breakfast Silver badge
    IT Angle

    Nothing like Java to get things done.

    I love getting things done in Java. Aside from XML, which isn't a programming language so doesn't really count, it really sets the benchmark for redundant, meaningless verbosity.

    I guess Ted Sweary has a point though. The big projects are always the best ones. Nothing good ever came off a small project.

  9. Fernando Carolo
    Thumb Up

    Examples

    The best known exemple of a Hadoop-powered application so far: Yahoo Search Webmap, a.k.a. the index behind their entire search service (http://developer.yahoo.com/blogs/hadoop/2008/02/yahoo-worlds-largest-production-hadoop.html).

    Other cases at http://wiki.apache.org/hadoop/PoweredBy.

  10. Phil Standen

    @ Ashlee

    "Jumping off point"?

    Which envelope pushing, whale song enhanced, power breakfast did that come from?

    His language is fine, yours is not.

  11. Daniel B.
    Go

    Oh yes you made my day

    Bashing on Web 2.0 useless projects, the flashy wannabe companies, trashing Ruby on Rails and vindicating Java?

    I feel *great* now. Keep it on! Enlighten the IT masses!!

  12. Steve
    Thumb Up

    @ Re: Journalistic niceties

    Fuck me, an adult! Ever thought of coming over here and running the BBFC for us. As my mum always said, "Sticks and stones will break your bones, but swear at the dinner table and I'll smack the shit out of you...."

    I think I may have that a little mixed up...

  13. Brian Walshe
    Dead Vulture

    Re: Journalistic niceties

    I'm not offended by the swear words. I'm offended by the fact that they were used to pad out a story that was completely contained in the first sentence.

    Seriously though, after telling us that Hadoop is an implementation of MapReduce, what does Ted tell us? Twitter are a bunch of jerks, ruby sucks, etc. He leaves it for someone in the comments section to even give an example of something that even uses the technology.

    Stick to blogging.

  14. Anonymous Coward
    Flame

    If only..

    Ted actually *used* Hadoop...

    "Comparatively, very few people actually use Hadoop in practice, and those who do don't write about it."

  15. The BigYin
    Thumb Down

    @Ashlee

    I have no trouble with bad language but I'd like some advance warning of what is to come when reading a "professional" journal, then I can make an informed decision as to whether or not I want to read the article.

    I respectfully suggest that if you don't like editing/blocking comments complaining about swearing that you ask your authors to refrain from swearing unless it is pertinent to the story. Something that will be very rare in a tech mag.

    As to the story...it's such a bitter rant-fest to be of near zero value. Yes some open source projects suck, so do some closed source. Wow. Gosh. Who'd have thought? Hey, some hardware sucks too! Can I have a job at El Reg if that is all it takes to wrote a story for you?

  16. David S

    I'm with DavCrav

    That was nearly two laptops you owed. Okay, making fun of web2.0 is a bit fish-in-a-barrel, but I genuinely liked the article. As for the people crying "pottymouth"... Well, fuck 'em if they can't take a joke.

  17. Anonymous Coward
    Stop

    Hadoop

    It's geordie for "stop".

  18. Steve Loughran

    Examples

    -Hadoop gets used by a fair few search startups. As of a few weeks, even Microsoft uses it. Anyone who is signed up the Hadoop UK event next week (its fully booked), can discuss there uses with others. If you want to see who is using it secretly, go to LinkedIn and search for Hadoop in resumes.

    As someone who works on Hadoop, I think this article was unfairly critical of RubyForge and other projects. Hadoop gets lots of engineering support because it matters to yahoo!;without that it would be seriously lacking in depth. But go look at that code and there are things in there to scare people. Its big, its complex, and to test that your filesystem scales to 1000 machines, you either need 1000 boxes or rented CPU time on them. Hadoop runs best on Linux, which, wait for it, contains lots of projects hosted on sourceforge, and git/mercury style. There is a lot of OpenSource behind Hadoop, and anyone who contributes to the linux and java open source codebases should be proud of their contribution.

  19. amanfromMars Silver badge
    Thumb Up

    Enjoying the Trip.

    These Register comment threads are a possible Hadoop Parallel.... although what Application IT would be Running and/or Feeding is a Ride Stealthily hidden.

    And I very nearly posted that with a Fedding typo.

  20. E

    @Ashlee

    I'm not offended by profanity per se, just gratuitous profanity. That said, this latest article is less gratuitous than the previous ones. So there is hope.

  21. Chintana Wilamuna
    Thumb Up

    Hadoop usage

    @Ian: Derek converting all the publicly available articles of NYTimes (from 1851-1980) to PDFs is a nice example.

    http://open.blogs.nytimes.com/2007/11/01/self-service-prorated-super-computing-fun/

  22. paulc

    oh noes...

    not another this will revolutionize "programming for the web" as we know it paradigm shift... I think the clue was the sheer volley of buzzwords in the first sentence...

  23. tom

    Inevitable that people will focus on the language, but...

    the language works. What else would you expect when you're halfway between a rant and an opinion piece? It's like if you crossed a Rolling Stone back issue with, well, a normal Reg article. All it needs is a huge cigarette ad.

    Web 2.0 might be fish in a barrel, but IMO Rails needs to be laughed at more often. Rails even *rhymes* with fails.

  24. Rob Menke
    Thumb Up

    Not exactly a shining star, but compared to everything else...

    The reason people don’t write about Hadoop is that good Hadoop programming is really old school Lisp programming. If you are capable of doing data analysis in Lisp without resorting to mutators, Hadoop will seem eerily familiar.

    My biggest complaint about Hadoop is its sucky documentation. It almost *has* to be open source just so you can figure out what the damn map-join classes are supposed to do.

    But what it does do right, it does so wonderfully.

    As for Yahoo, they unfortunately view Hadoop as the panacea for all their computing woes: applying it to all problems even where it should not be applied, then gnashing and wailing when it fails to yield performance benefits. Case in point: the project I am working on is willing to take a 50% performance hit to achieve Hadoopification, just because it’s fucking Hadoop, bitches. Another sad tale of using a power drill for a hammer.

  25. mijj
    Happy

    fun

    I didn't have a clue what the piece was about, but I enjoyed the swearin'.

  26. Anonymous Coward
    Thumb Up

    Ted, just admit it...

    Great article. Haven't the foggiest what some of it is about, but I like the way it was written.

    I like the idea of a Lester Bangs/Hunter S. Thompson for IT.

    If others don't like it, even better

  27. Lucas Carlson
    Thumb Down

    Creativity

    Taking the time to understand something and then building something totally different with that inspiration is called creativity. Taking the time to understand something and copying every aspect of it is called homework. Way to build positive community around contributing code to the public by name calling and teasing.

  28. Einar Vollset
    Thumb Down

    What the fucken fuck?

    I notice that Ted's quit dumping on young, unsuccessful startups. I wonder why that is?

    http://www.techcrunch.com/2008/07/19/pressflip-is-a-belly-flop/

    Hmm?

  29. Anonymous Coward
    Anonymous Coward

    RE: Re: Journalistic niceties LOL OMG FUNNY!1!!! erm, sorry...

    Ashlee, it's all well and good saying "If you're likely to be offended, don't read this stuff", but as it stands right now, the average mouse-wielder's thought processes are going to be along the lines of:

    "Ooh, interesting article title! I got to read me some of that!"

    *click*

    *read*

    *read*

    *encounter the word "fuck"*

    *apoplexy*

    *stiff reprimand on the comments page*

    (except, of course, in the case of those mouse-wielders who are obsessively inspecting the linked URL and shy away at seeing the word dziuba in it, thus saving their time for other acts of self-torture/abuse)

    How about giving Ted a unique icon for his pieces so that everyone knows what they're getting into before they click? I propose a red triangle, as per films on Channel 4 back in the day where it indicated you were guaranteed to get some mucky bits.

  30. Ashlee Vance (Written by Reg staff)

    RE: Re: Journalistic niceties LOL OMG FUNNY!1!!! erm, sorry...

    All of Ted's stories have the Fail and You moniker visible from the homepage.

  31. nin-mofo

    WRRRRYYYYYYY is not WHHHHHYYYYYYY

    "any one of the over nine thousand regurgitations"

    I see what you did there.

    Also, MOAR "Fail and You".

  32. Adam Pisoni

    Skynet - A pure ruby MapReduce framework

    Skynet is a complete Ruby MapReduce framework built for small to large projects. It is currently live at Geni.com, powering their family news application among other things. http://skynet.rubyforge.org

  33. Anonymous Coward
    Paris Hilton

    hadoop kept quiet

    The reason hadoop isn't getting any press is because it has a fucking shit name it's bollockingly boring, cock-splinteringly cuntish and shit-screwingly twat-o-gasmical to nobody but those scandewegian toss-mongerers who think it sounds like their silver-minged grandmother's maiden name. The shit fucking sod arse piss of it is that no jism-sucking tit wanker's going to use some-fucking-thing that sounds like a diarrhetic PH farting. We have to swear on this page, right ? I hope nobody minds that I used a bit of punctuation; I wouldn't want to interrupt the potty-mouth parade.

    Say what you like about the badger-botherers they do know how to pick a catchy name. I'm about to launch my new anti-2.0 hype aggregator and after much drunken consideration have called it cockflaps ... I bet it will get more press than poopdad or whatever it's called.

  34. Anonymous Coward
    Black Helicopters

    skynet runs a family news service

    And from that we are going to get Armageddon?

  35. Anonymous Coward
    Thumb Down

    More bitter Java proponents...

    Look, another bitter Java proponent desperately trying to breathe new life into his overhyped, cross-platform, "God's gift" to programming language that lost steam years ago. Perhaps ripping apart the languages currently on the rise will convince the masses to return to Java. Let it go, dinosaur, let it go.

  36. B

    Not quite...

    I've actually been involved in projects using Hadoop (HDFS came first actually), and it's still not fault tolerant, just look up what happens when the name node dies. Nasty. It's a good project, and I'm glad it exists, but it's nowhere near the nirvana it's made out to be here. I've also been involved and personally know of projects using both Starling (not so great... mostly agree with the author) and Starfish (it's awesome for it's target use, and it works well.)

    -b-

  37. Bob
    Flame

    nice one

    Every so often, an open source software commentary comes along designed to shatter the safe tight-lipped approach to industry commentary with an amazing flame-bait expletive dropping article. Whoa that's a new one.

    The Rails community practically coined this awesome use of Web 2.0 sass by dropping F-bombs in almost every lecture (see every conference where DHH has ever spoken). So original.

    So yeah, I'm not a Rails church member, but I occasionally bust out an app in Rails when I need something done... fast.

    You are forgetting one of the most important factors that leads to shitty code, as well as one of the most important things that matters to VCs... time to market. Yeah, Twitter might suck when it comes to certain things, but you are completely forgetting why Twitter got to be as big as it is. It was there first...

    There are plenty of reasons why open source software sucks. Twitter's attempt at open source should be applauded even if their project lacks support. The fact of the matter is, the web is 80% read, 20% write (probably less than that). I myself have worked on a couple of open source projects. Although the project might lack in commiters from outside sources, that certainly does not mean that the code is not being looked at or even used in some capacity. I can only imagine that Twitter's code is being used in some way.

    To second what I think was a point in your article... Doug Cutting, Lucene, Hadoop, HDFS, and the people who have contributed to these projects exemplify engineering genius. We are using all of these (minus the Doug Cutting and Hadoop folks, although we would love to have them) where I work. We are also using Rails.

    One strength of an architect is determining what tools are available in front of him/her and determining where they best serve their purpose. Rails is great for what it does, so is Hadoop. Just because Hadoop is the bees-knees and there are some crappy attempts at Map-Reduce in other languages which are ill suited for the task, does not mean that other open source projects suck ass because they aren't ultra-scalable or in use at Yahoo.

    If you think that writing a lamblasting article at Twitter's messaging protocol in favor of Hadoop's ultra-scalable distributed computing approach to large computational problems is entertaining, then you might also enjoy comparing apples to oranges. Sure Twitter has problems, so does anything that grows fast. Maybe they will use Hadoop in some way to fix some of their infrastructure problems, but damning Twitter to IT hell is kind of overboard.

    Congratulations on the page views, and consider me a fish taking your flame bait hook, line, and sinker.

  38. Anonymous Coward
    Anonymous Coward

    The call of distributed data processing

    I don't know if everyone wants to work on it - but oh what joy they must have had cracking it out in Java.

    distcc is an open source project, it works quite well as a distributed compiler system, perhaps they could have extended that, though Java is not supported by distcc so maybe they could have added that as well, whilst they were at it.

    Erlang could have been interesting to do this in, and Haskell works quite well for these type of tasks. Sure Ruby, is an odd one, but hey probably has its place, the hubbub has died down over it, as of late.

  39. Léon
    Thumb Up

    HDFS

    We're currently setting up a HDFS cluster on top of which we're going to store large image files. Currently somebody is trying to get our application to talk to the Hadoop API so things actually can be stored, because the FUSE 'plugin' (i know it could be a better description, sorry) is still in development and lacks several functions we need, such as setting permissions. Its projects like these that make me wish i could understand C(++) so i could actively help out. Funny thing is that we were looking for a DFS and later learnt that Hadoop is more than that.

  40. James Anderson

    Java verbose?

    I had a strange experience last year. Part of a project I was working on involved a fancy parser for strange and obsolete texts. As it was an add on to an existing php system the whole lot was written in php 5.0 and it all worked very nicely - another part of the organisation wanted to use this but insisted it must be written in Java -- so off I went and rattled the same piece of code out in picky verbose Java -- the surprise was that I ended up with about 2000 lines of code vs. the php implementation of 1800.

    My conclusion is that in real life java is not more verbose just a bit more painful read. Also very revealing was just how well php 5 was suited to "hard" problems outside of its web page niche.

    PS I thought "hadoop" was Geordy for "arrested and charged".

  41. Francis Fish

    It's a real shame that Twitter is used to bash Rails

    Even in the Rails community Twitter have put their foot in it several times and, IMHO, haven't got huge credibility. They moaned about not being able to share database connections and then a very able guru known as Dr Nic (he is a Ph D. - and a lovely barking mad person) showed them they didn't know what they were talking about in about 10 lines of code. Hell, even I knew what they were saying didn't sound right and I had only been using rails for a couple of months then, but I have been database programming for the best part of 20 years.

    I've also used another of their messaging tools, called Beanstalk, and it sucks. We are going to throw it away. It comes with a Ruby gem to allow you to talk to it and rails plugin that adds stuff to the Active Record class - the plugin breaks active record beyond repair - useless. I wrote a 100 line add-in that allowed you to make asynchronous calls using Beanstalk. I looked at their plugin and the code was nasty.

    We're replacing Beanstalk with a very simple daemon that comes in a hundred lines of code or so, is traceable, and works. Very sceptical about Twitter stuff after this experience, and it's a shame they're held up as an example.

    We aren't a web 2.0 company. We just use Rails to get stuff done really quickly. If there was a faster development framework we'd use it. But I never want to go back to Java, it slows you down and gets in the way.

    http://francis.blog-city.com/java_is_bad_for_your_brain_talk_at_barcamp_manchester.htm

  42. Anonymous Coward
    Alert

    Pressflip 2.0

    I find it extremely amusing to see all this bile vomited over Web 2.0 companies from someone from Pressflip.com a classic example of a failed Web 2.0 startup if ever there was one!

  43. Anonymous Coward
    Go

    @Ashlee

    I'm not offended. Just distracted. Snark is good. Cussin to make a point or a joke is good. Gratuitous cussin just detracts from the content and humor of the piece.

    Ted is obviously a talented writer. He's clearly aiming for Gonzo (journalism, not muppet), so I'm not sure know why he'd settle for Sophomoric. And I'm not sure why you'd let him.

  44. Paul McConkey
    Paris Hilton

    @Anon

    Nearly lost another laptop to a mouthful of coffee, funniest el Reg comment I've read to date!

    Paris - obviously.

  45. Anonymous Coward
    Anonymous Coward

    I don't realy understand the artical,

    But as for the swearing, it seems...

    1) Most of the detractors are using the word "cussin", a very American word. Swearing is part of the British sense of humour, but Americans do seem to be far more sensitive to it than the British.

    2) Swearing dose not show a lack of language skills, quite the opposite if used correctly.

    3) very very funny article.

  46. Peter Kay

    Fuck me, a java library that's not for the punks out there..

    I'm sure it toasts bread in its spare time or similar, but a few notes :

    1) Blogosphere. Anyone who mentions this tends to make me think it's yet another trendy library, rather than something that actually just gets work done

    2) Who is actually using it? Links?

    3) What's wrong with it? Nothing is perfect

    4) A link to hadoop would be useful, even if it's an instant google to find it..

  47. Timothy Slade
    Happy

    Language

    Just my tuppence: there is nothing wrong with the language in this article. Maybe I think this because I grew up in the inner city. Maybe it is because of my age. Maybe it is because my parents let me swear in the house. Or maybe it is because they are just words. You validate them, by reacting to them.

  48. Jax

    I only realised there was swearing

    after reading the comments.

    _Really_ enjoyed reading the article. More like this please! :)

    As per the swearing, Stephen Fry knows more than _all_ of us about the English language and he thinks swearing is great. So yea!.... that.

  49. Anonymous Coward
    Anonymous Coward

    He doesn't say anything..

    Try swapping in random words and you see it makes no difference...

    "Why don't you see kebabs coming out of places like EasyJet? Because

    that shit is hard. It could also be that companies like EasyJet are blissfully

    unaware of any meat other than chicken."

    See, no difference. Exactly the same amount of information.

    Why not a quick run down of what it actually does?

    Why would anyone want a file system written in Java (or any virtual machine environment for that matter)?

    I'm going to have to look it up now to find out what it is, the whole article might as well been "Have a look at Hadoop on wikipedia, it might be fucking interesting."

  50. Charlie Clark Silver badge
    Thumb Up

    Keep 'em coming Ted

    Shit on the Twatters of this world all you can but try and cut down on the gratuitous "my language is better than yours" remarks. I'm personally not a fan of Ruby (the syntax does my head in) and while I'm not that keen on the Rails approach I can appreciate that it, too, has a part to play. If "all the customer wants" is something that can be implemented using a Rails-like framework then it is reasonable to provide the solution using that. With all the usual provisos, of course. The main one being that the customer didn't really know what they wanted and six months later something completely different is being written from scratch and possibly using a different framework... Nonetheless my experience is that the code produced on the back of such frameworks is generally better than that which would have been written in the absence of such.

    A detailed discussion of the pros and cons of these frameworks versus a properly engineered one would be welcome. In the meantime maybe you can join Ashlee on his audio show and give that Web2.0 devotee Mr. Rosenberg the tongue-lashing he so thoroughly deserves!

  51. W
    Pirate

    Sweary

    No match for Mr Agreeable. One time legend of Melody Maker. Now to be found on The Quietus (.com). Which is, incidentally, where a certain Moderatrix has been known to lurk...

    Sort it out Reg. Get some proper swearing on 'ere. Maybe you could recruit master signwriter himself: Mr Tourette.

  52. Jamie Kitson

    Funny...

    The title says "adult" and yet it's obviously written by a child.

  53. Francis Fish
    Happy

    I shouldn't have been quite so rude about beanstalkd

    It's ok for what it does, fire and forget processing into an in-memory list of things. But we couldn't trace it and see if it had died. It didn't suit our needs.

    Horses for courses.

    And I don't care about the swearing, honest, just makes you look like someone who would be *very* difficult to work with in a team situation because of your arrogance. But as that'll never happen it isn't a problem, is it?

  54. The Other Steve
    Thumb Up

    mod +1 to swearies

    And fuck the haters.

  55. Ryan Barrett

    Apples and Oranges

    Ruby, particularly Rails, is great for developing small/medium scale bespoke solutions.

    For example administrative systems, or invoicing systems for firms with 5-500 employees. Lots of code reuse, short development cycles, and it'll all run on a pair of servers.

    All for a whole lot less cash and far less hassle than letting SAP and co at it.

    Java's far more geared towards projects which need to scale and must be multiplatform. Big, but not infrastructure.

    Which leaves C/C++ (and a pattering of assembly) to cover the infrastructural stuff - where the Big Boys play.

    Dissing Ruby because Twitter sucks is like dissing an hacksaw because it does such a poor job at chopping trees down. You should be dissing the frikkin' idiots who're trying to use a hacksaw for a job which requires a chainsaw....

  56. Ron Eve
    Thumb Up

    @hadoop kept quiet

    You cunt you owe me a new laptop, mine's now drowning in tea...

    Two questions:

    Do you read Viz?

    or

    Do you come from Newcasltle?

    Neither did I have any clue whatsoever what the article is about (slow day here..) but the comments were superb. Gotta love the potty-mouths.

  57. Anonymous Coward
    Pirate

    The Real Failure

    is that Hadoop is written in Java.

    Java is fine for things that I don't care about.

    Were it written in C, I would take it seriously.

    RoR/Java/Web2.0/Linux are all in the same failboat

  58. andy rock
    Happy

    swearing, what swearing??

    on a serious point, though, i would have liked some reference points about who is using hadoop, successfully. i know Ted tells us that people who use it "don't write about it" but reference would give the article substance and move it further away from 'just an opinion'.

    i like the swearing, though.

  59. Mike Powers
    Pirate

    It's like Hitchiker's Guide to Open Source

    Anyone who's been up to the higher dimensions knows that they're a pretty nasty lot up there, who should all just be smashed and done in, if we could only work out a way of firing missiles at right angles to reality.

  60. William Towle
    Thumb Down

    Re: Gotta love...

    BrianPosted> "Just gotta love this guy's prose style - he says offensively in one sentence what would take three or four polite sentences."

    Or could have left the sentence at "just because" and said the same in two fewer words.

    It's unnecessary, and probably gets the page blocked by public libraries (e.g. their filters didn't like the Freshmeat summary pages when one of its project descriptions used a phrase like "hardcore [number crunching]" a few years back). Fortunately we don't have a page filter here (we've got one on the outgoing email though, and somebody replying to their partner's kiss-times-ten got told to rephrase and couldn't work out why, har har).

    Not big, not clever, not considerate. Not impressed.

This topic is closed for new posts.

Other stories you might like