Cybercrooks have developed a new technique for manipulating search engine results in order to promote the crud they sell, such as scareware packages. Hackers first place benign pdf files on web pages they are seeking to promote, before replacing these documents with booby-trapped Flash files once a new site has been indexed. …
Just because the pretty picture on the box you got at christmas, says it contains a forever friends bear, doesn't mean it WILL contain one...
... very surprised no-one has used this technique before...or more likely - it has gone under the radar.
Isn't there an easier way?
It sounds like this relies on an assumption by Google that PDF docs won't change, or won't change frequently, like HTML pages do. If so that's a good trick, but I've sometimes wondered what's to stop somebody from simply serving a different file when Google asks for it than what rest of us get.
"but I've sometimes wondered what's to stop somebody from simply serving a different file when Google asks for it than what rest of us get."
This is done plenty of times. There is lots of content on news websites that is indexed by Google and requires you to "login" or "purchase" to view.
Change your user-agent to Googlebot and you see a different world.
Differing behaviour for search engines...
"what's to stop somebody from simply serving a different file when Google asks for it than what rest of us get"
I do this on my b.log. If you do not supply a specific date to read, it will show the last entry. Obviously this is liable to change, so I don't want search engines to record this entry. Therefore, upon detection of a search engine, it puts out a generic message (but the date links all work for correct indexing).
I agree. It's much easier and reliable to present a different document to googles robots than the rest of the world.
You don't even have to go by IP address, as the googlebot helpfully announces itself in the user-agent
This trick is almost as old as the search engine web sites. Give the web crawlers content stolen from other sites. Give people advertisements and/or malware to real people. It's a simple matter of checking the User-Agent header.
There are similar tricks for malware hosting. A malware server or its name servers will show system administers a special page that appears to be operating normally or an error page saying that the site has been deactivated. Everybody else gets the malware, a fake store front, or a proxy for various uses.
@ kevin mcmurtie
how does it determine who the sys admins are?
most hosting companys that are plauged with malware allow it so dont see why they would go to this bother to setup a proxy on the off chance the admin use's the same box ip everytime??
how does it determine between sys admin and everyone else :S