Sometimes plaintiffs just don't know when to quit. After losing a trademark infringement suit against a competitor, Healthcare Advocates - a patient advocacy organization based out of Philadelphia - sued the intellectual property law firm that represented the defendant in the trademark action, alleging that the firm had "hacked …
If you want to keep information away from people, isnn't it better to remove it from a public-facing server???
Which offence is worse under the law: withholding evidence or breaking the DMCA? Surely the prosecution could just have got a warrant anyway?
Robots.txt a legal requirement?
Is there any legal requirement for anyone to obey robots.txt? I thought the only way someone could "make" you obey robots.txt was by a licensing agreement / contract, which would stipulate that anyone accessing the data on their site implicitly agrees to obey robots.txt - but I'm not sure this kind of contract has been found enforceable in court yet.
Further, does DMCA really allow a "technical measure" that is based on an unsigned contract?
The proper use of robots.txt
This whole argument seems to be wrong-headed. robots.txt is not designed to keep WEB BROWSERS out of any web site. It does not do this and has never been intended to do so. It is only intended to control access by search robots. It is not mandatory: I quote from "A Standard for Robot Exclusion":
"It is not an official standard backed by a standards body, or owned by any commercial organisation. It is not enforced by anybody, and there no guarantee that all current and future robots will use it. Consider it a common facility the majority of robot authors offer the WWW community to protect WWW server against unwanted accesses by their robots."
The best that can be said, according to Internet Archive's FAQ, is that the Internet Archive's copying robot should obey the robots.txt convention.
Assuming that the intent of "A Standard for Robot Exclusion" hasn't been overturned by a previous case, it appears that the healthcare firm brought the case against the IA on specious grounds and that the judge didn't do his homework.
You don't want cops looking in your bedroom window.
You hang a curtain over it.
The wind blows the curtain aside.
You are seen with your current fetish through the window by a passing cop.
Acting under the authority of the "plain view" doctrine ... they bust you.
Does simply hanging the curtain afford you protection, even if it is occasionally rendered ineffective by outside forces? Nope. At that moment, when the wind blew and the cop was looking your way, there was no curtain.
IF the wind had *not* blown the curtain aside, the cop could *not* have moved it aside themselves without violating search warrant requirements. However since the curtain *was* moved by the wind, it is as if it were not there at all, and the cop is relieved of his obligation to the warrant rules.
IF the robots.txt *had* been acted on by IA in the manner in which HA intended, and blocked the infringing content from view so that the law firm needed to take *extra* measures to capture the evidence they needed, then a "lock" would have been in place, and they might have violated the DCMA to get at the content.
But a gust of wind blew ... and HA's dalliance with the devil was exposed.
James, read the article again
Your metaphor makes no sense. The people looking at the website had no idea that there was supposed to be a level of protection between them and what they were looking at. If there's no barrier there, and you don't know there's supposed to be a barrier there, how is it your fault if you see whatever barrier's supposed to be protecting?
It's tricky ...
As others have posted, robots.txt is not actually an access control system.
But the fact that it is not designed to keep web browsers but robots out is actually not relevant here - I guess the argument was that WayBack had the past versions from /their/ robot, which shouldn't have accessed that content, had it decided to follow the robot exclusion standard as it normally does.
The fact is that nobody violated a protection measure, and the 'rule' that the web bot violated was an informal recommendation rather than a real rule, and those accused of accessing the information that shouldn't have been there certainly did nothing wrong in any sense as far as I can see.
[ Your metaphor makes no sense. The people looking at the website had no idea that there was supposed to be a level of protection between them and what they were looking at. If there's no barrier there, and you don't know there's supposed to be a barrier there, how is it your fault if you see whatever barrier's supposed to be protecting? ]
robots.txt = curtain
The law firm looking for evidence = cop passerby, ignorant of curtain's existence
HA = flagrant diddler caught when the flimsy curtain (robots.txt) they thought would protect them form prying eyes ... didn't
What doesn't make sense? I agree with the court.
robots.txt is a voluntary pseudo-standard used by website owners who wish to communicate their suggestions/preferences for automated crawling of their server-available content.
IA had some kind of technical issue that voided the robots.txt suggestions/preferences. IA did not break any laws because adherence to the suggestions contained in robots.txt is not codified into any law.
The law firm did NOT know the HA content they viewed was *supposed* to be protected, and they did nothing extraordinary to gain access to that content.
Ergo, they did not violate the DCMA by viewing it in the Wayback Machine.
It doesn't matter that HA *thought* robots.txt would protect them. Adherence to its suggestions/preferences is strictly voluntary and clearly cannot be counted on in a pinch. HA was foolish to rely on its voluntarily-applied protections.
Plenty of "less than scrupulous sites"
There are plenty of sites out there where you can get the information. robots.txt is kind of like putting a sign in my front yard that says "don't look in my front yard", but without putting up a view block, there's nothing to say for it.