The Register® — Biting the hand that feeds IT

Yahoo! cuddles Google's bastard grid-child

Tom Chiverton

Speed bump 

"With Webmap, we can do this 33 per cent faster on the same hardware."

Never mind, MicroSoft will port it to .Net when they buy Yahoo and make it slower again.

Sebastian

good 

Unhappy

I would like too see another great search-engine like Google, I used Yahoo! for years until Google became faster and more accurate than Yahoo!

Theres 1 question ramaining unclear for me as i read this article:

Does speeding up the indexing of pages also speed up the search results? I think no. And does it mean there are better search results with the use of this grid? It only seems to be a matter of more efficient cpu-use.

Simon Greenwood

re: good 

Broadly speaking, GFS, Nutch and Hadoop et al are all designed to improve processing rather than results as they're all about distributed systems on commodity hardware so they scale horizontally rather than vertically and improve the way that processors are used for those queries. JAQL does sound like the complementary query language for the system but I suppose that the implication is that it has to remember what queries 'work' by scoring them and remembering them, making them part of the information that's being processed in the first place...

tim

re: re: good 

Presumably, while speeding up indexing does't nessaraily bring direct benefits to the end user, the time/processor power/money freed up can be used to do other things, like optimising the search.

Robert Hill

CPU efficiency can benefit both.... 

Boffin

Given greater CPU work for a given set of hardware, certainly costs and electical power usage are reduced. But increasing the total CPU work available can also enable new levels of data analysis and algorithms that were unfeasable without the horsepower for them. A decade ago I helped a major airline upgrade from an SGI sever to a cluster of IBM SP2 supercomputers and massive storage arrays - and they began using data an entire magnitude larger and more complete in their load forcasting calculations. They did re-write the software (had to paralleize it explicitly as they moved from SMP to MPP, and change the calculations to use more granular data), but the notion of what they were doing didn't change - just the data and ability to sift through more of it. Apparently they gained quite better results, once they tuned the new forecasting system's parameters.

Anonymous Coward

where! have! the! exclamation! marks! gone!?! 

Coat

Okay, so, I've not actually read the article (I might do in a minute) but I thought that there was a rule that stated all items with "Yahoo!" in the title had to have an exclamation mark after every word in the headline.....

no?

oh well.

coat, retrieved, donned, and left the building.

Sid

@Robert Hill 

Eh?

"had to paralleize it explicitly as they moved from SMP to MPP, and change the calculations to use more granular data"

WTF are you on about? I'm sorry but twenty years in the computer industry along with a degree in computer science, and I still have no idea what you mean there? Could you re-phrase it so us mere mortals have a clue?

Peter Gathercole

@Sid re. SMP vs.MPP 

Boffin

SMP=Symmetric Multi-Processing

MPP=Massivly Parallel Processing

With an SMP box, there is a single OS image that schedules applications across the processors. If you write threaded code, then most SMP implementations will schedule threads on seperate processors without you having to write the code to explicitly taks into account the fact that there are multiple processors.

With MPP, there are multiple OS images in the cluster, and you have to write to an API that will allow different units of work to be placed on different systems. This means you have to make the application much more aware of the shape of the cluster. This also means that if not written carefully, you may not get better performance by adding additional nodes into the cluster.

Unfortunatly, too many IBM SP/2 implementions were not really parallel processing clusters, more like lan-in-a-can systems (goodness, where did I dredge that term up from).

But what Google does is a quantum leap up from what SP/2s were capable of, and are much more like Mare Nostrum and Blue Gene/L.

Anonymous Coward

@Sid 

Twenty years in the computer industry obviously hasn't taught you how to use a search engine.

A swift google revealed:

http://www.webopedia.com/TERM/M/MPP.htm

HTH

Sid

@Peter Gathercole and AC 

Thanks for the explanation, I'd got the SMP and MMP bit, I've just never heard of Granulising Data, It left me totally baffled (as you probably guessed).

AC: You'll be pleased to know I've since googled it, and its just a posh name for creating subsets or something like that... I think? ...Possibly?

Nurse, Nurse! wheres my Spectrum?

Peter Gathercole

@Sid 

Happy

Sorry, I thought that the "granulising data" was reasonabily obvious and didn't think that it needed explaining, so assumed that it was the other terms that needed explaining. My bad.

Sid

@Peter Gathercole 

I'm afraid my adventures in Concurrent Processors stopped at the Transputer and Occam, something I never got the hang of, because of my habit of using TAB instead of spaces. Boy! would that confuse things :)

So your explanation was gratefully received.

Mind, I still use the command line to compile stuff, Old dog, new tricks and all that.

Dalen

Link farms and ads 

I wonder how those would affect mapping?