back to article Hadoop: When grownups do open source

Hadoop is a library for writing distributed data processing programs using the MapReduce framework. It's got all the makings of a blogosphere hit: cluster computing, large datasets, parallelism, algorithms published by Google, and open source. Every four days or so, a nerd will discover Hadoop, write a “Basic MapReduce Tutorial …

COMMENTS

This topic is closed for new posts.

Page:

Thumb Up

Examples?

Any examples of people actually putting Hardoop to use? Not only would it be interesting, but it would also prove your point that Hardoop is actually *useful*; if nobody uses it for anything important, then who cares how well crafted it is, it is just as pointless as the Twitter clusterfuck.

Otherwise, great article.

0
0
Silver badge
Thumb Up

Class.

"Along with the data processing framework, Doug Cutting also included a fault tolerant, replicated, distributed file system with Hadoop just because fuck you."

If I were drinking coffee right then, you'd owe me a new laptop.

0
0

Journalistic niceties

Do you get paid extra for writing articles that swear a lot ?

Or does it just mean your vocabulary is wanting ?

0
0
Go

I love these articles

Go anger Geek! GO!

0
0
Anonymous Coward

_usually_

Usually I dislike any swearing and insults people use to make a point, since normal words that have *meaning* should be able to do the job. You're excused though, since you make excellent points along the way. A lot of open source is horrible, I can't deny - although that does not mean it has no value.

Perhaps the issue with "common" open source is that most projects are not created in tight(ly coordinated) cooperation between a few skilled individuals, but by, uh, often lesser men - partly hobbyists - in a more loose effort.

For the record - I'm litte more than a hobbyist coder myself.

0
0
Thumb Up

Gotta love...

Just gotta love this guy's prose style - he says offensively in one sentence what would take three or four polite sentences.

More, please, from Ted!

0
0
(Written by Reg staff)

Re: Journalistic niceties

I'll use this comment as a jumping off point for a note on Ted's stories.

Lots of you are griping about his language. I'm rejecting almost all of those comments. Save yourself some time and write something constructive. It might get published.

Please don't torture yourself by reading the stories if "bad" language offends you. Otherwise, just relax and enjoy the ride.

0
1
Bronze badge
IT Angle

Nothing like Java to get things done.

I love getting things done in Java. Aside from XML, which isn't a programming language so doesn't really count, it really sets the benchmark for redundant, meaningless verbosity.

I guess Ted Sweary has a point though. The big projects are always the best ones. Nothing good ever came off a small project.

0
0
Thumb Up

Examples

The best known exemple of a Hadoop-powered application so far: Yahoo Search Webmap, a.k.a. the index behind their entire search service (http://developer.yahoo.com/blogs/hadoop/2008/02/yahoo-worlds-largest-production-hadoop.html).

Other cases at http://wiki.apache.org/hadoop/PoweredBy.

0
0

@ Ashlee

"Jumping off point"?

Which envelope pushing, whale song enhanced, power breakfast did that come from?

His language is fine, yours is not.

0
0
Silver badge
Go

Oh yes you made my day

Bashing on Web 2.0 useless projects, the flashy wannabe companies, trashing Ruby on Rails and vindicating Java?

I feel *great* now. Keep it on! Enlighten the IT masses!!

0
0
Thumb Up

@ Re: Journalistic niceties

Fuck me, an adult! Ever thought of coming over here and running the BBFC for us. As my mum always said, "Sticks and stones will break your bones, but swear at the dinner table and I'll smack the shit out of you...."

I think I may have that a little mixed up...

0
0
Dead Vulture

Re: Journalistic niceties

I'm not offended by the swear words. I'm offended by the fact that they were used to pad out a story that was completely contained in the first sentence.

Seriously though, after telling us that Hadoop is an implementation of MapReduce, what does Ted tell us? Twitter are a bunch of jerks, ruby sucks, etc. He leaves it for someone in the comments section to even give an example of something that even uses the technology.

Stick to blogging.

0
0
Flame

If only..

Ted actually *used* Hadoop...

"Comparatively, very few people actually use Hadoop in practice, and those who do don't write about it."

0
0
Silver badge
Thumb Down

@Ashlee

I have no trouble with bad language but I'd like some advance warning of what is to come when reading a "professional" journal, then I can make an informed decision as to whether or not I want to read the article.

I respectfully suggest that if you don't like editing/blocking comments complaining about swearing that you ask your authors to refrain from swearing unless it is pertinent to the story. Something that will be very rare in a tech mag.

As to the story...it's such a bitter rant-fest to be of near zero value. Yes some open source projects suck, so do some closed source. Wow. Gosh. Who'd have thought? Hey, some hardware sucks too! Can I have a job at El Reg if that is all it takes to wrote a story for you?

0
0

I'm with DavCrav

That was nearly two laptops you owed. Okay, making fun of web2.0 is a bit fish-in-a-barrel, but I genuinely liked the article. As for the people crying "pottymouth"... Well, fuck 'em if they can't take a joke.

0
0
Stop

Hadoop

It's geordie for "stop".

0
0

Examples

-Hadoop gets used by a fair few search startups. As of a few weeks, even Microsoft uses it. Anyone who is signed up the Hadoop UK event next week (its fully booked), can discuss there uses with others. If you want to see who is using it secretly, go to LinkedIn and search for Hadoop in resumes.

As someone who works on Hadoop, I think this article was unfairly critical of RubyForge and other projects. Hadoop gets lots of engineering support because it matters to yahoo!;without that it would be seriously lacking in depth. But go look at that code and there are things in there to scare people. Its big, its complex, and to test that your filesystem scales to 1000 machines, you either need 1000 boxes or rented CPU time on them. Hadoop runs best on Linux, which, wait for it, contains lots of projects hosted on sourceforge, and git/mercury style. There is a lot of OpenSource behind Hadoop, and anyone who contributes to the linux and java open source codebases should be proud of their contribution.

0
0
Silver badge
Thumb Up

Enjoying the Trip.

These Register comment threads are a possible Hadoop Parallel.... although what Application IT would be Running and/or Feeding is a Ride Stealthily hidden.

And I very nearly posted that with a Fedding typo.

0
0
E

@Ashlee

I'm not offended by profanity per se, just gratuitous profanity. That said, this latest article is less gratuitous than the previous ones. So there is hope.

0
0
Thumb Up

Hadoop usage

@Ian: Derek converting all the publicly available articles of NYTimes (from 1851-1980) to PDFs is a nice example.

http://open.blogs.nytimes.com/2007/11/01/self-service-prorated-super-computing-fun/

0
0

oh noes...

not another this will revolutionize "programming for the web" as we know it paradigm shift... I think the clue was the sheer volley of buzzwords in the first sentence...

0
0
tom

Inevitable that people will focus on the language, but...

the language works. What else would you expect when you're halfway between a rant and an opinion piece? It's like if you crossed a Rolling Stone back issue with, well, a normal Reg article. All it needs is a huge cigarette ad.

Web 2.0 might be fish in a barrel, but IMO Rails needs to be laughed at more often. Rails even *rhymes* with fails.

0
0
Thumb Up

Not exactly a shining star, but compared to everything else...

The reason people don’t write about Hadoop is that good Hadoop programming is really old school Lisp programming. If you are capable of doing data analysis in Lisp without resorting to mutators, Hadoop will seem eerily familiar.

My biggest complaint about Hadoop is its sucky documentation. It almost *has* to be open source just so you can figure out what the damn map-join classes are supposed to do.

But what it does do right, it does so wonderfully.

As for Yahoo, they unfortunately view Hadoop as the panacea for all their computing woes: applying it to all problems even where it should not be applied, then gnashing and wailing when it fails to yield performance benefits. Case in point: the project I am working on is willing to take a 50% performance hit to achieve Hadoopification, just because it’s fucking Hadoop, bitches. Another sad tale of using a power drill for a hammer.

0
0
Happy

fun

I didn't have a clue what the piece was about, but I enjoyed the swearin'.

0
0
Thumb Up

Ted, just admit it...

Great article. Haven't the foggiest what some of it is about, but I like the way it was written.

I like the idea of a Lester Bangs/Hunter S. Thompson for IT.

If others don't like it, even better

0
0
Thumb Down

Creativity

Taking the time to understand something and then building something totally different with that inspiration is called creativity. Taking the time to understand something and copying every aspect of it is called homework. Way to build positive community around contributing code to the public by name calling and teasing.

0
0
Thumb Down

What the fucken fuck?

I notice that Ted's quit dumping on young, unsuccessful startups. I wonder why that is?

http://www.techcrunch.com/2008/07/19/pressflip-is-a-belly-flop/

Hmm?

0
0
Anonymous Coward

RE: Re: Journalistic niceties LOL OMG FUNNY!1!!! erm, sorry...

Ashlee, it's all well and good saying "If you're likely to be offended, don't read this stuff", but as it stands right now, the average mouse-wielder's thought processes are going to be along the lines of:

"Ooh, interesting article title! I got to read me some of that!"

*click*

*read*

*read*

*encounter the word "fuck"*

*apoplexy*

*stiff reprimand on the comments page*

(except, of course, in the case of those mouse-wielders who are obsessively inspecting the linked URL and shy away at seeing the word dziuba in it, thus saving their time for other acts of self-torture/abuse)

How about giving Ted a unique icon for his pieces so that everyone knows what they're getting into before they click? I propose a red triangle, as per films on Channel 4 back in the day where it indicated you were guaranteed to get some mucky bits.

0
0
(Written by Reg staff)

RE: Re: Journalistic niceties LOL OMG FUNNY!1!!! erm, sorry...

All of Ted's stories have the Fail and You moniker visible from the homepage.

0
0

WRRRRYYYYYYY is not WHHHHHYYYYYYY

"any one of the over nine thousand regurgitations"

I see what you did there.

Also, MOAR "Fail and You".

0
0

Skynet - A pure ruby MapReduce framework

Skynet is a complete Ruby MapReduce framework built for small to large projects. It is currently live at Geni.com, powering their family news application among other things. http://skynet.rubyforge.org

0
0
Paris Hilton

hadoop kept quiet

The reason hadoop isn't getting any press is because it has a fucking shit name it's bollockingly boring, cock-splinteringly cuntish and shit-screwingly twat-o-gasmical to nobody but those scandewegian toss-mongerers who think it sounds like their silver-minged grandmother's maiden name. The shit fucking sod arse piss of it is that no jism-sucking tit wanker's going to use some-fucking-thing that sounds like a diarrhetic PH farting. We have to swear on this page, right ? I hope nobody minds that I used a bit of punctuation; I wouldn't want to interrupt the potty-mouth parade.

Say what you like about the badger-botherers they do know how to pick a catchy name. I'm about to launch my new anti-2.0 hype aggregator and after much drunken consideration have called it cockflaps ... I bet it will get more press than poopdad or whatever it's called.

0
0
Black Helicopters

skynet runs a family news service

And from that we are going to get Armageddon?

0
0
Thumb Down

More bitter Java proponents...

Look, another bitter Java proponent desperately trying to breathe new life into his overhyped, cross-platform, "God's gift" to programming language that lost steam years ago. Perhaps ripping apart the languages currently on the rise will convince the masses to return to Java. Let it go, dinosaur, let it go.

0
0
B

Not quite...

I've actually been involved in projects using Hadoop (HDFS came first actually), and it's still not fault tolerant, just look up what happens when the name node dies. Nasty. It's a good project, and I'm glad it exists, but it's nowhere near the nirvana it's made out to be here. I've also been involved and personally know of projects using both Starling (not so great... mostly agree with the author) and Starfish (it's awesome for it's target use, and it works well.)

-b-

0
0
Bob
Flame

nice one

Every so often, an open source software commentary comes along designed to shatter the safe tight-lipped approach to industry commentary with an amazing flame-bait expletive dropping article. Whoa that's a new one.

The Rails community practically coined this awesome use of Web 2.0 sass by dropping F-bombs in almost every lecture (see every conference where DHH has ever spoken). So original.

So yeah, I'm not a Rails church member, but I occasionally bust out an app in Rails when I need something done... fast.

You are forgetting one of the most important factors that leads to shitty code, as well as one of the most important things that matters to VCs... time to market. Yeah, Twitter might suck when it comes to certain things, but you are completely forgetting why Twitter got to be as big as it is. It was there first...

There are plenty of reasons why open source software sucks. Twitter's attempt at open source should be applauded even if their project lacks support. The fact of the matter is, the web is 80% read, 20% write (probably less than that). I myself have worked on a couple of open source projects. Although the project might lack in commiters from outside sources, that certainly does not mean that the code is not being looked at or even used in some capacity. I can only imagine that Twitter's code is being used in some way.

To second what I think was a point in your article... Doug Cutting, Lucene, Hadoop, HDFS, and the people who have contributed to these projects exemplify engineering genius. We are using all of these (minus the Doug Cutting and Hadoop folks, although we would love to have them) where I work. We are also using Rails.

One strength of an architect is determining what tools are available in front of him/her and determining where they best serve their purpose. Rails is great for what it does, so is Hadoop. Just because Hadoop is the bees-knees and there are some crappy attempts at Map-Reduce in other languages which are ill suited for the task, does not mean that other open source projects suck ass because they aren't ultra-scalable or in use at Yahoo.

If you think that writing a lamblasting article at Twitter's messaging protocol in favor of Hadoop's ultra-scalable distributed computing approach to large computational problems is entertaining, then you might also enjoy comparing apples to oranges. Sure Twitter has problems, so does anything that grows fast. Maybe they will use Hadoop in some way to fix some of their infrastructure problems, but damning Twitter to IT hell is kind of overboard.

Congratulations on the page views, and consider me a fish taking your flame bait hook, line, and sinker.

0
0
Anonymous Coward

The call of distributed data processing

I don't know if everyone wants to work on it - but oh what joy they must have had cracking it out in Java.

distcc is an open source project, it works quite well as a distributed compiler system, perhaps they could have extended that, though Java is not supported by distcc so maybe they could have added that as well, whilst they were at it.

Erlang could have been interesting to do this in, and Haskell works quite well for these type of tasks. Sure Ruby, is an odd one, but hey probably has its place, the hubbub has died down over it, as of late.

0
0
Thumb Up

HDFS

We're currently setting up a HDFS cluster on top of which we're going to store large image files. Currently somebody is trying to get our application to talk to the Hadoop API so things actually can be stored, because the FUSE 'plugin' (i know it could be a better description, sorry) is still in development and lacks several functions we need, such as setting permissions. Its projects like these that make me wish i could understand C(++) so i could actively help out. Funny thing is that we were looking for a DFS and later learnt that Hadoop is more than that.

0
0

Java verbose?

I had a strange experience last year. Part of a project I was working on involved a fancy parser for strange and obsolete texts. As it was an add on to an existing php system the whole lot was written in php 5.0 and it all worked very nicely - another part of the organisation wanted to use this but insisted it must be written in Java -- so off I went and rattled the same piece of code out in picky verbose Java -- the surprise was that I ended up with about 2000 lines of code vs. the php implementation of 1800.

My conclusion is that in real life java is not more verbose just a bit more painful read. Also very revealing was just how well php 5 was suited to "hard" problems outside of its web page niche.

PS I thought "hadoop" was Geordy for "arrested and charged".

0
0

It's a real shame that Twitter is used to bash Rails

Even in the Rails community Twitter have put their foot in it several times and, IMHO, haven't got huge credibility. They moaned about not being able to share database connections and then a very able guru known as Dr Nic (he is a Ph D. - and a lovely barking mad person) showed them they didn't know what they were talking about in about 10 lines of code. Hell, even I knew what they were saying didn't sound right and I had only been using rails for a couple of months then, but I have been database programming for the best part of 20 years.

I've also used another of their messaging tools, called Beanstalk, and it sucks. We are going to throw it away. It comes with a Ruby gem to allow you to talk to it and rails plugin that adds stuff to the Active Record class - the plugin breaks active record beyond repair - useless. I wrote a 100 line add-in that allowed you to make asynchronous calls using Beanstalk. I looked at their plugin and the code was nasty.

We're replacing Beanstalk with a very simple daemon that comes in a hundred lines of code or so, is traceable, and works. Very sceptical about Twitter stuff after this experience, and it's a shame they're held up as an example.

We aren't a web 2.0 company. We just use Rails to get stuff done really quickly. If there was a faster development framework we'd use it. But I never want to go back to Java, it slows you down and gets in the way.

http://francis.blog-city.com/java_is_bad_for_your_brain_talk_at_barcamp_manchester.htm

0
0
Alert

Pressflip 2.0

I find it extremely amusing to see all this bile vomited over Web 2.0 companies from someone from Pressflip.com a classic example of a failed Web 2.0 startup if ever there was one!

0
0
Go

@Ashlee

I'm not offended. Just distracted. Snark is good. Cussin to make a point or a joke is good. Gratuitous cussin just detracts from the content and humor of the piece.

Ted is obviously a talented writer. He's clearly aiming for Gonzo (journalism, not muppet), so I'm not sure know why he'd settle for Sophomoric. And I'm not sure why you'd let him.

0
0
Paris Hilton

@Anon

Nearly lost another laptop to a mouthful of coffee, funniest el Reg comment I've read to date!

Paris - obviously.

0
0
Anonymous Coward

I don't realy understand the artical,

But as for the swearing, it seems...

1) Most of the detractors are using the word "cussin", a very American word. Swearing is part of the British sense of humour, but Americans do seem to be far more sensitive to it than the British.

2) Swearing dose not show a lack of language skills, quite the opposite if used correctly.

3) very very funny article.

0
0

Fuck me, a java library that's not for the punks out there..

I'm sure it toasts bread in its spare time or similar, but a few notes :

1) Blogosphere. Anyone who mentions this tends to make me think it's yet another trendy library, rather than something that actually just gets work done

2) Who is actually using it? Links?

3) What's wrong with it? Nothing is perfect

4) A link to hadoop would be useful, even if it's an instant google to find it..

0
0
Happy

Language

Just my tuppence: there is nothing wrong with the language in this article. Maybe I think this because I grew up in the inner city. Maybe it is because of my age. Maybe it is because my parents let me swear in the house. Or maybe it is because they are just words. You validate them, by reacting to them.

0
0
Jax

I only realised there was swearing

after reading the comments.

_Really_ enjoyed reading the article. More like this please! :)

As per the swearing, Stephen Fry knows more than _all_ of us about the English language and he thinks swearing is great. So yea!.... that.

0
0
Anonymous Coward

He doesn't say anything..

Try swapping in random words and you see it makes no difference...

"Why don't you see kebabs coming out of places like EasyJet? Because

that shit is hard. It could also be that companies like EasyJet are blissfully

unaware of any meat other than chicken."

See, no difference. Exactly the same amount of information.

Why not a quick run down of what it actually does?

Why would anyone want a file system written in Java (or any virtual machine environment for that matter)?

I'm going to have to look it up now to find out what it is, the whole article might as well been "Have a look at Hadoop on wikipedia, it might be fucking interesting."

0
0
Silver badge
Thumb Up

Keep 'em coming Ted

Shit on the Twatters of this world all you can but try and cut down on the gratuitous "my language is better than yours" remarks. I'm personally not a fan of Ruby (the syntax does my head in) and while I'm not that keen on the Rails approach I can appreciate that it, too, has a part to play. If "all the customer wants" is something that can be implemented using a Rails-like framework then it is reasonable to provide the solution using that. With all the usual provisos, of course. The main one being that the customer didn't really know what they wanted and six months later something completely different is being written from scratch and possibly using a different framework... Nonetheless my experience is that the code produced on the back of such frameworks is generally better than that which would have been written in the absence of such.

A detailed discussion of the pros and cons of these frameworks versus a properly engineered one would be welcome. In the meantime maybe you can join Ashlee on his audio show and give that Web2.0 devotee Mr. Rosenberg the tongue-lashing he so thoroughly deserves!

0
0

Page:

This topic is closed for new posts.

Forums