The Register® — Biting the hand that feeds IT

Google Oompa Loompas cloaking user agents?

Gavin McMenemy

So why is Net Applications seeing what it's seeing? 

Stop

What they want to see?

J

Hm... 

Joke

They are using Windoze but feel ashamed of that (or afraid of being fired?), that's why...

filey

because... 

Thumb Down

it sounds more menacing and scary to say they are cloaking

total non-story

This post has been deleted by a moderator

Simon

Catching the bad guys 

Black Helicopters

There might be a non-sinister explanation. A lot of the work Google does is related to quality, so anyone in the anti-spam team, AdWords quality, fraud detection, search quality, etc. At http://www.justlanded.com we have seen Google servers pretend to be different O/Ss and user agents from the same IP at the same times. Yahoo do the same stuff and from time to time we have to unblock their IPs when they use things like Wget to pull down thousands of pages (typical email harvesting or scraper behaviour).

People will see black helicopters everywhere... that's not to say they won't be launching a distro or some other MicrosoftWhack though...

Charlie van Becelaere

Perhaps 

the user agents are really there, it's just a bit too cloudy to see them properly?

<getting coat>

Henry Wertz

Just someone who is "paranoid"? 

I wonder if it's just someone that is "paranoid". You know, flash off, no java, no javascript, clear the cookies and browser history all the time. Some people at google might just be "Ha! I'm going to clear the user-agent string too!" Or testing anonymization tools. Something like that. I just don't see google hiding a new OS by clearing it all out entirely. Maybe it's even just an internal test build of Chrome that had a bug making it leave the user-agent blank 8-).

This post has been deleted by a moderator

joe_bruin

Rampant speculation 

Go

The most likely reason for blank user agents: the Googlers have decided that they want to encourage websites to be standards compliant instead of detecting the browser type and building a page for that one. This sounds pretty consistent with a company that has just released a minority browser platform.

James

An innocent proxy? 

Gates Horns

As I recall, blanking or replacing the user-agent string is a standard feature of Squid (and presumably other proxy servers as well). OpenBSD's pf firewall has a "modulate state" option, which does something similar on a TCP/IP level (randomising all the parameters, making it hard to identify the OS generating the traffic).

If I wanted to hide my secret OS/browser, I'd have it report itself as something like a Subversion build of Firefox/Gecko running on WinXP - looking normal in logs, while having an obvious explanation for any odd behaviour server admins might notice (it's a work-in-progress version of an open source browser, of course it's not acting in exactly the same way as the last released version!).

Blanking the user-agent, on the other hand, would make sense in two ways: first, as a paranoid sysadmin wanting as little information getting out as possible (so you blank user-agent and probably have a firewall randomising parameters too) - second, to help catch sites which are running spider-traps which serve up pages of link-spam to anything other than IE. (In fact, the comment about these 'appearing to be real people not spider activity' could be exactly the point: comparing the pages seen by real people - and their proxy - to the pages served up to Googlebot.)

Or option 3: they don't want the world knowing that for all the hype about using 'Goobuntu' and having their own web browser, 90% of their staff are still using IE 7 on XP!

Simon Painter

what a total load of fud 

Dead Vulture

There is no way that google don't proxy outbound traffic and knowing them they are not buying stuff off the shelf when they can make it themselves. It's eminently possible that they have their own proxy that doesn't put in a user agent.

Where's the 'slow news day' icon?

Anonymous Coward

when are they going to grow some 

Alert

and come up with a "windoze" compatible op and put MS to the sword?

Anonymous Coward

Surely they're just trying to catch cheaters? 

Stop

Some cheaters send one set of content if they see Googlebot in the UA string, and another if they see any other agent. So it makes a lot of sense to me if google double-checks the googlebot results by fetching the same URL again with a different UA string and seeing if they get the same results returned.

Anonymous Coward

Mobile optimisation? 

I've seen quite a few of these from Google, and reached the conclusion it was requests from mobile users passing through google where the pages are optimised for display on a phone. The name of the system escape me this early in the morning, but there was a lot of discussion of it some time ago on webmasterworld etc.

Ken Hagan

Re: Rampant speculation 

Happy

"The most likely reason for blank user agents: the Googlers have decided that they want to encourage websites to be standards compliant instead of detecting the browser type and building a page for that one."

My thoughts entirely. Perhaps the world would be a better place if *everyone* did that and the broken sites discovered that they weren't getting customers anymore.

Jon

ng ng ng... 

Thumb Down

They're using a proxy you idiot

ryan

@James 

Black Helicopters

or that even Google aren't using Chrome...

Anonymous Coward

Let the people browse 

I see google non spiders come and look at some of my sites, I sort of expect people in google to use the web themselves, and yes the header is fairly stripped, there are some blank ones, but we can all do that if we like.

The reason is the same as the no_exist-google87048704 they are seeing what the site returns to a browser with no user agent, or it is just some of them don't wish to mention the user agent, there is no law requiring it :)

Anonymous Coward

use of User-Agent 

Boffin

"A browser user agent not only identifies the browser a machine is using, but also its operating system."

Not necessarily. RFC 2616 only specifies that "User agents SHOULD include this field with requests. The field can contain multiple product tokens and comments identifying the agent and any subproducts which form a significant part of the user agent." (§14.43) Operating system is not a significant part of the browser.

As several people have mentioned, Google may be testing the behavior of sites when they are not given a known User-Agent header. Sending an empty string instead of forging it (e.g. "fake user-agent") could be an attempt to make it less noticeable in logs. They're clearly up to something, and trying not to leak any information about it.

charles uchu

maybe these oompas are their new "consensus engine" 

So, mysterious blank user agents

Coinciding with some human ranking component to google search results

Makes for a nice little no-useragent widget of googoompa-loompa desktops that has them clicking happily all day, voting for search results, and improving their chocolate...i mean search results.

Hmmmm...