Google and Belgian newspaper group Copiepresse have come to an agreement on a minor part of their dispute over copyright, but have not agreed on the major point of difference between them. Copiepresse members will use an automated system to keep Google from saving its content in its cache, but that is a technical fix that has …
It has never failed to amaze me that most people think it's perfectly acceptable for Google to copy anything it wants, in its entirety. Whether it's web sites, newspaper articles, or entire books, people think it's OK simply because "It's Google" and "Well you should have used the robots.txt file". Yet if another organization tried the same thing, these people would be up-in-arms. I'm not going to start a war here, nor will I offer an opinion on whether or not I think Google has the right to do what it does. My point is simply that brand-recognition ("It's OK because it's Google") should not be a factor in whether or not you support the organization's actions.
My secondary point is that copyright protection is automatic by law. It is not opt-in or opt-out. I cannot go to the library and use a photocopier (or scanner) to copy/scan all of their books, only stopping when an author or publisher issues a request for me to not copy/scan an individual book. I don't have that right. And neither does Google or anyone else. Unless a work is in the public domain, or the author has explicitly given permission for such copying, it is illegal, no matter who does it (except under certain circumstances). Similarly, it should not be up to website designers to use the robots.txt file. I know it's there for a purpose, but there are a lot of people out there (many of them with websites) that don't know the specifics of a robots.txt file or how a search engine indexes sites. The lack of any person or company to use a technical means to stop a search engine (or anyone else) from copying their content does not remove the protections granted to them under the copyright act(s).
Chris - Bad analogy
Google does not (on the whole - let's ignore the fact that you can access the full cached page they grabbed) reproduce whole web pages verbatim. Neither Google Search not Google News do this.
A more representative analogy than yours (copying an entire book from a library) is that that they merely produce a summary of a book and point you in the direction of the book if you want to know more. That's no different to a "What's in the newspapers" section on a website, TV or radio program, a review of newly published books, or any similar scenario.
Chris has a point (unknowingly?)
I think Chris actually has made a good point - as an "owner" of several websites I personally do not mind Google indexing (or even archiving) them. But, like advert e-mails, it should be an opt-in... in other words, the robot.txt file should be there to ALLOW the indexing of the file, not prevent it. That way, the people who actually WANT their file indexed would quickly learn how to do it, and the people who *don't* do not have to worry about each and every search engine's peculiarities.
Unless there's a damn good reason, any public bulk mechanism should be opt-in. Period.
Peter misunderstood my point
Peter, I think you misunderstood my point. Whether or not Google reproduces any content is irrelevant. My point is that the very act of Google copying the content is infringement (whether or not they share that reproduction with anyone). That's why I made that analogy. If I copy a book from the library, I'm not showing it to anybody else, it's only for my personal enjoyment; it's still illegal. So the act of Google doing the same is illegal, yes? Publicly showing (and reproducing) that reproduction is another matter entirely.
Aubry is spot-on. If a website owner voluntarily decided to tell search engines to index/archive the site (opt-in), then that is the permission needed to avoid copyright infringement. And that's the way it should have been from the beginning.
By setting the permissions on your webserver to public you have already opted in. It's the technical equivalent of putting your creative work on a 200ft billboard. If you don't want your articles transmitted (copied) to everybody then set both your permissions and robots.txt accordingly.
Christine hit the nail on the head there
It's not just an argument over Google using it's pervasive brand value to take what it wants - there is also an element of the public to the web.
Once upon a time, the web left the military and went to universities around the world. Then it went public, and got famous. Now it's 70% public, 30% in the boardromm - and that's where this whole argument comes from. While I disagree with a lot of Google's policies on working within repressive countries, at least they still see the web as a public entity, whereas most companies are trying to move it from soemthign that carries corporate value within, to an inherently corporate entity.
Yes, but some owners of 200ft billboards don't have the time or the inclination to look up all the technical details on whether the billboard is accessible to all. They just want you to stop looking at it when they find out!
What's the problems ...
... with the default being to allow Google to index ?
To post a website you need certain skils, if you don't have those skills and don't consult someone who does then you have no excuse.
If you buy a house you look at the locks, and if you don't know about locks, ask someone who does. If you never look and see that there are no locks at all, then don't be surprised if someone walks in.
If you don't want your site indexed, then flag it as such. If you can't be arsed doing the job (making a website) properly then you've only yourself to blame !
- Analysis BlackBerry Messenger unleashed: Look out Twitter and Facebook
- IT bloke publishes comprehensive maps of CALL CENTRE menu HELL
- Nine-year-old Opportunity Mars rover sets NASA distance record
- Prankster 'Superhero' takes on robot traffic warden AND WINS
- British LulzSec hackers hear jail doors slam shut for years