The techniques used by unloveable rogues who automate search engine manipulation attacks themed around breaking news to sling scareware have been unpicked by new research from Sophos. A research paper published on Wednesday by Sophos researchers Fraser Howard and Onur Komili lifts the lid on the search engine optimisation …
The solution is for the search engines to harvest from IP addresses that cannot be associated with the search engines. That way, the content cannot be customised depending on if it is Google or some poor user fetching it.
Would also solve some of the problems with search engine poisning for other things, like google search for products.
I'm sure that someone like Google could easily arrange to "borrow" IP addresses from large ISPs on a random basis.
..... not quite.
Search engines don't just use a predictable pool of IP addresses; they also use a predictable user-agent string.
Do you really think it could be that easy?
Hell, if that was the solution, I'd no doubt be using the 'BristolBachelorBot' search engine, today, wouldn't I? The reason why there's only one serious contender, and one wannabe, in this market, is because it's hard.
Even if those factors didn't alert you that a search engine was on-the-visit, the very fact that it reads your Robots.txt file is a bit of a giveaway. I'm sure you wouldn't advocate search engines stop reading robots.txt?
Google regularly and deliberately haze the behaviour of their search engines, to throw these people off, but its a constantly moving battle. I really don't think people outside of search, realise the enormity of the problem of automatically gathering realistic data on the Web, these days. We only notice it, when it fails.
heres one to add to the IP/Hosts blocklist
heres the IP of one nasty malware virus checker(worm) that seems to crash my browsers everytime it gets refered to by google.
IP to block 184.108.40.206
the real answer
Is to spoof your user agent as googlebot and hide your referer.
or just use trusted news sites for your breaking news stories maybe.