Monetising kids?
Their behaviour is sickening, and the failure of the law to recognise this is just as bad.
Worrying times.
A US court has dismissed Google from a lawsuit accusing the advertising giant of illegally tracking small children online. The US Third Circuit Court of Appeals said the Chocolate Factory will not [PDF] have to face allegations it used information about youngsters to serve them customized ads without their parents' knowledge …
"Their behaviour is sickening, and the failure of the law to recognise this is just as bad."
Why, because no-one else has ever made money from children before? There are entire industries dedicated solely to making money off children - toys, games, sweets, various food and drink, TV, films, and so on. And all of them rely very heavily on advertising in places that children will see in ways that will appeal to them. Not only are Google far from the worst offender in this respect, they're not even doing it deliberately - they just treat any visitor the same way and try to match adverts with what they've previously shown an interest in. If this is such sickening behaviour, there are a hell of a lot of other companies you should be demonising before Google even merits a mention.
That is surprising. From what I can tell, Googlebot usually respects robots.txt. Consider testing your config on Google's validator: https://www.google.com/webmasters/tools/robots-testing-tool
Googlebot crawls the site anyway but takes the entry in robots.txt to mean "don't show it in the search results". This different from "don't crawl this site" that most people think robots.txt means and expect to happen.
No, Google really does not crawl the parts of the site protected by robots.txt.
However, if there is a link from the unprotected part to the protected part, then any information that can be gathered from the link (like the text of the link) will be indexed.
So if you link to the files with the words "pictures of my children", Google will index that, without actually crawling the URL.
You can also use header tags in your pages like <META NAME="GOOGLEBOT" CONTENT="NOINDEX, NOFOLLOW, NOARCHIVE"> to tell Google not to index, follow links and stored a cached copy, but it may still crawl your page. Also, if your page are linked by other pages outside your site Google will still reach them. Unlukily, most crawlers believe whatever is "public" can be crawled, and the interpretation of robots.txt "directives" is always the most advantageous for them. The only way to protect your "private" contents is to password protect them.
> The only way to protect your "private" contents is to password protect them.
Not so. I once implemented a system which did reverse DNS lookups to white-list certain bots.
The site extensively used client-side templating, so although rogue bots which didn't declare reverse DNS could in theory snarf data, it wasn't in a crawler friendly format.
Of course this system was designed from the ground up to behave this way, and it would be basically impossible to bolt this on to an existing CMS based site.
"The only way to protect your "private" contents is to password protect them."
Indeed that is what I ended up doing.
For any complex site robots.txt is useless for the indexing too. There will almost always be pathways that you didn't consider that allow the buggers to index. Its an unwinnable fight with regexps to keep them out. Best to use the meta tags. I have name='googlebot' content='noimageindex' on every page.
Yeah, robots schmobotz!!! What chance do you have when the website states "HEY GROWN-UPS: We don't collect ANY personal information about your kids. Which means we couldn't share it even if we wanted to!"
But then they do... and the law does nothing about it?
Shameful.
You need top go back and RTFM. So far the courts have said
a) You can't sue Google because they acted in good faith.
b) You can't sue Nic.com the cookies as they where sufficiently anonymised.
c) You can sue Nic.com over their claim of "HEY GROWN-UPS: We don't collect ANY personal information about your kids. Which means we couldn't share it even if we wanted to!".
So your point was?