Is Google Book Search the last library? Geoff Nunberg, one of America's leading linguistics researchers, laid this rather ominous tag on Google's controversial book-scanning project amidst an amusingly-heated debate this afternoon on the campus of the University of California, Berkeley. "This is likely to be The Last Library," …
"Nobody is very likely to scan these books again." Why not? Google did nothing magical.
If you're too lazy to get off your butt and do it, don't criticize the people that actually get work done. These are the same people that would say evolving a backbone is too expensive.
Stop "Monday-morning quarterbacking" and put in your own original effort. Nunberg says he can do it better, but he sure as hell isn't.
Where's the problem here?
It'd have been nice if the article had actually explained why the current practice is a problem, rather than just ranting or quoting the (quite possibly justified) rants of others as is fashionable in Google-related articles on El Reg.
For example, misindexing can be fixed without needing to "re scan the original". Obviously someone has to *notice* it's misindexed, but...
Similarly, a poor quality OCR with errors can probably be fixed without needing to "re scan the original".
The problems I see are mostly legal/commercial, not really related to going back to "re scan the original", assuming the original images haven't been discarded. Google discarding *anything* sounds unlikely.
El Reg for sale
I'm taking offers for licenses to use El reg content , webservers and writers. Don't worry that I don't own these, it no longer matters! A judge has ruled that third parties can license the work of people they don't legally represent, and so I offer El Reg in the same way!
Google are being portrayed as the bad guy here, but this is a nasty stitch up between the US Authors Guild / Association of American Publishers & Google, to grab rights to something NEITHER HAS THE RIGHT TO! Those orphan works are orphans and it's fraud to pretend that they have the right to license them or represent the author in any way.
The publishers claimed to have the right to these orphan books, Google challenged it, because they do not have the right to those books, then flipped it's position and a judge agrees the license.
This does not mean that the Association of American Publishers has the rights to license those books, it means their claim to a license is unchallenged and hence assumed to be valid.
But it's still fraud and it still should be prosecuted.
Isn't that what internships were invented for?
6 months until you can download this on torrent!
Google will not have control over these scanned books long.
It wont take long for the entire Digital Library to be ripped to an offline Database and uploaded as a Torrent!
Fool of a Took Google has just made copyright material easier to steal!
@Where's the problem here?
"A poor quality OCR with errors can probably be fixed": sure, with manual input - if Google will pay for that.
But one of the problems with Google Books is that very often the *scanning quality* itself is dreadful: a fact well known to anyone who has tried to use the free public domain offerings for any serious purpose. Nothing is going to fix that problem apart from rescanning: never mind all that flannel about 'Nobody is ... likely to scan these books again.'
But you are right that there are huge legal problems, both over the original scanning, and still more, in my view, over the settlement agreement. The commercial problems stem basically from the fact that Google's attempts to circumvent the law potentially place it at an advantage compared to its more law-abiding competitors: which is unacceptable, and damages the whole framework within which we all do business.
@El Reg for sale
"A judge has ruled that third parties can license the work of people they don't legally represent"
He hasn't made his ruling yet.
"Those orphan works are orphans"
Except when they aren't. The 'orphan books' tag is being widely used as propaganda for the settlement: it is implied that it will only affect a lot of old rights whose ownership is uncertain.
The truth is that a lot of the authors, and authors' heirs, who haven't claimed their books yet, or opted out of the settlement, are perfectly easily traced: it is just that Google has declined to go looking for them. (And the settlement notice system is a bad joke: even in the US, let alone outside it.)
These authors of what Google also calls 'unclaimed books' believe that their rights are protected under international copyright law (and let's hope they are). They haven't yet got the message that there are people out there with big plans to help themselves to their work.
@6 months until you can download this on torrent!
"It wont take long for the entire Digital Library to be ripped to an offline Database and uploaded as a Torrent!
Fool of a Took Google has just made copyright material easier to steal!"
I think you are probably right, unfortunately, but I doubt whether it would do much damage to Google. I think the sure source of income for Google in this is what it will make from ads posted on search results and preview pages. I don't think piracy will do much, if anything, to reduce that.
The people who will lose out most from piracy are the authors, who will suffer damage to the value of their copyrights. And the authors who will lose out most of all will be the ones who don't even realise this is happening, so they won't know to pull their books from display. Most of them will be outside the US.
FFS. If you want to make another library, all you need to do is go and scan the books. Then you need to deal with the legal hassles and the predictable whines (like this article) that have become so tedious.
I used to wonder if the Google Book Settlement was the right way to do things. But after reading all the bullshit that been spouted against it I'm certain that it's absolutely essential. Only the Orphan Works legislation comes anywhere near close in the Internet garbage-blogger stakes (as evidenced by the ignorant AC comment above).
Old wine in new bottles
My understanding about copyright and patent, is that these are limited monopolies granted by the state in return for something happening that might not always happen.
Eventually they both expire.
Once a book goes out of copyright, e.g., a old classic, suddenly it's republished by someone, been going on for years, no-one cared.
The BFI are desperate to republish old films with no monetary value but are hampered by the complexity of orphaned copyrights and the belief by some that there's a goldmine once BFI get involved.
Chandos was a small record label that used to record out of copyright obscure stuff for afficionadoes but got stiffed (IIRC) by someone claiming copyright because they'd rewritten the score for the label and they won royalties, net result, Chandos made a loss on a recording that would not otherwise have happened. Net outcome, new recording of such work becomes financially unviable.
I'm sure Google do all the things they do for sound financial reasons, there is a possibility of some philanthropy too, no doubt (how many billions does one person need? Google Summer of Code?)
They're paying hard cash to do something that wasn't happening, I get to use it whatever technology I've got.
If I want to watch those Feynman lectures now, I have to use and pay for Microsoft technology.
The coalition up against Google don't seem to have a similar track record of openness and living and dying by the market.
I don't need to worry about the morality of any party, but soon as Google do something, that it wouldn't have happened otherwise is simply passed over by those who can see a deep pocket. That's how it goes.
From over here, on balance it feels like Google are acting directly or indirectly for the wider benefit.
But in the realm of cui bono, let's not invoke Robin Hood, neither discuss the long forgotten author of the important but specialist text nor cite the pension needs of the person that played the triangle on the once soon to be out of copyright sixties rock anthem.
This thing against Google isn't about defending poor people or wider society it's about vested interest, monopoly and control.
Who actually reads digital books? I cannot see anyone replacing the good ol hard/paperback for a portable digital reader. If your going to do that, you would be far better off listening to someone read the book to you.
That way you can lie on the beach and "read" with your eyes closed. Or fall asleep in a bus etc etc.
I've read the news many times on an iphone/ipod/smartphone, but read a book? I think not.
It is quite annoying after one page...
Google closed the door and sealed it shut.
Actually, no, others probably won't get a crack at scanning those books, thanks mainly to Google. The authors, and the agreement with Google that was forced on them, would stop anyone from scanning a large portion of the books that Google scanned. Books that were scanned illegally, because yes, it was illegal for them to do so at the time. Google got away with it because they are very, very rich, and they managed to force an agreement that only covers them.
Anyone else doing it would either need to negotiate with each and every author separately, for a cost of billions of dollars, or would also need to do it illegally. They'd have to do it illegally anyway for those works whose copyright owners cannot be found, or those who refuse permission - something Google didn't have to bother with. But they'd get stopped a lot earlier next time, because of the agreement that Google has. Rest assured that Google will spend its money to protect that agreement, and thus stop anyone else from doing what they did.
So we're hooped. Google has managed to be the last one that will probably get away with scanning all those books. Which means that yes, their version is probably the last version, and we're stuck with Google having "the last library".
Dont't forget Copyright EXPIRES !
Copyright protection does not last forever. In Europe it lasts for 70 years after the author's death.So when the copyrighted books start to expire, anyone will be able to freely re-use the data Google has so expensively paid to collect and licence. In this sense, Google are doing us a favour: they are compilling tommorow's copright-free digital library. Alibrary that anyone will be able to tap into and that will not be able to control once copyright law protection expires.
In a sense, copyright law that they so blatantly violated, strikes back at them !
Those copyright owners who haven't come forward will be free to do so under the settlement. Google has agreed to establish a registry for them to file claims with. Not only that, Google has agreed to *pay* them for each work they claim.
There might be a number of things in there that don't represent real orphaned works, but under the terms of the settlement, that won't last.
Likely to scan again?
From a computer nerd perspective, it is tempting to consider all scanned information equal (byte is byte, right?). But in the real world there is a whole hirarchy of information - important versus unimportant, fundamental versus deducted, ...
So it's rubbish to claim nobody will ever look at the original books again. If it turns out to be important, the books will be rescanned and reclassified as neccessary by however is interested. Welcome to the world of digital information!
I see that Google are setting themselves up to be the next library of Alexandria. They've even had practice setting fire to the building, as reported by El Reg recently.
In Britain/England we used to have
something called 'common land'. We do have a little bit left. Landlords and other felt it was an inefficient hold over from serfdom and ought to be dispensed with so they started the 'enclosure' process.
Is it not the case that google are executing a similar or analogous process with literature...?
Call it what it is, please. Theft, plain and simple; but there is no enforced law against it.
copyright expires--- NOT
Does anyone think Hollywood and other thieves and plagiarists are going to allow copyright to expire? Give me a break Jackson! Just like every time before, when the time draws nigh, the law will be changed, just as it has been in the past, from 14 years to the lifetime of the author to life-of-the-author-plus-70-years, etc. Notice that it never gets any shorter? Someone has to take back the literature from the monopolists!
(I want to use two icons, how do I do that?)
Re: scanning/OCR quality
A coupla people earlier on mentioned OCR quality, especially with old books. The problem being that if the OCR software can't figure out what a word is, you need a person to look at it. Luckily, this has already been solved - ReCaptcha (look it up), uses words from book scanning projects in it's anti-spam captures for web-forms. Very nice system. Simple and efficient.
Secondly, as I've mentioned before - if Google don't do this, who will? There's not many who can afford it, and I don't see any of the companies who can, stepping up to the plate. Printed books don't last forever, so the sooner we get them digitised, the better.
Remember, once they're digital, there's no monopoly. The hackers will get them one way or another, and then we'll all have access via bittorrent or whatever. Regardless of what you think of the morality / legality of this, I'm just happy that the information / knowledge will be saved, rather than to disintergrate with the pages of a forgotten book.
The Last Library?
Is that like The Final Encyclopaedia by Gordon R. Dickson?
Freetard, it might come as a surprise, but a lot library use is not exactly aimed at improving a particular beach-experience.
The Library Problem
What distinguishes a library from other sources of books is the catalogue. Consider, for example, "The Three Musketeers". Who did the translation, and when? For a library catalogue, and for anyone with an academic interest, that matters a lot. And the translation can be protected by copyright even when the work itself is not. I also understand that there's one really good translation of "The Three Musketeers", but which is it?
Google have a system plagued by errors, not just publication date. Some of those errors can be traced to data from third parties. A lot of libraries have been scanning old books, clearly out-of-copyright, and supplied scans and catalogue entries. In some cases, Google's import of the catalogue data seems to have been less than competent.
Heck, I was checking on a film, last night. Olivier's version of Henry V, trying to discover the premiere date in England. I wanted to know if a character in a story I'm writing could have seen it. Now, there is a date in Wikipedia, and I suppose the character won't have the chance to see it. Is it accurate? Well, I've not found an earlier date. But over the web as a whole that film is variously dated to 1944, 1945, and 1946.
But what hurts about this Google mess is that they expect a fix to just happen. They don't even try to look as if they care.
And you walk around everywhere with your eyes closed.
Google copyright can never run out...
The way I understand the US copyright law it's upto 70 years after the death of the author or 120 years after the creation or 95 years after publication of corporate works. Who wants to bet that sometime soonish Google will try show that these poor, unloved, destitute orphaned works were 'created' by them. At least the digital version was created and published by them so they may have a leg to stand on. /cue the DOOM song/
I've had a look at their OCR'd books and most of them are missing whole pages and entire sections being pretty much illegible and that's only for books that have been printed I would shudder to see what they've done to handwritten books or books with annotations. I doubt recaptcha will be able to fix the missing pages problem.
If you want to do digitisation correctly it takes a lot of effort. Someone has to scan the books, someone needs to peer review the scan for errors. We also have the scans transcripted so that they can be transformed into html/xml code that we can then serve. Then comes the fun part of providing context for that book; whose copy is it, what do the annotations down the side of this specific book mean, what was the political/scientific/cultural atmosphere surrounding the initial publication and for the annotations, what were the repercussions of the book publication and reprints, who are the people the author has mentioned, what have they done in relation to this book? Without the context how can you make a detailed study of the book?
/can we get a google is evil symbol?/
Digital Reading No Good?
I am curious exactly why reading a book on a computer, smartphone, or ebook reader is so much more inferior to the real thing? In many ways, particularly when doing research, the fully digital version (OCR copies as opposed to mere TIFF or JPG page copies) are vastly superior in terms of searching, synthesizing and organizing information. Of course, in the "olden days" of CRT screens and low quality display adapters, flicker from low refresh rates could rapidly make reading uncomfortable. This simply isn't the case anymore with hires, flicker-free screens, especially the new OLED displays on smartphones and ebook readers.
In addition, many of us simply don't have the square footage necessary to accommodate large libraries. I have over 10,000 books, several thousand journal volumes, and thousands of reprints and pamphlets in my personal library. Of these, I have less than a thousand in hard copy form which already tightly pack the walls of my smallish study. The rendering of most modern publications in pdf format has been a godsend for me.
Missing the point
As so many have incorrectly pointed out, once the copyright expires there's no monopoly over the works. This may technically be true but going to Google for all your library needs will generate them a hefty fortune in ad revenue, it'll result in you having to turn over yet more prized identity information over a company who can be trusted with by virtue of the fact they won't elaborate on why they have it.
As one commenter pointed out, the copy of the original can be copyrighted which means once the original copyright ends Google will have the rights to all the expired works they scanned, at which point you'll be able hear the chants of "All your book are belong to us!" all the way from the Chocolate Factory
- Xmas Round-up Ten top tech toys to interface with a techie’s Christmas stocking
- Google embiggens its fat vid pipe Chromecast with TEN new supported apps
- Microsoft: Don't listen to 4chan ... especially the bit about bricking Xbox Ones
- Shivering boffins nail Earth's coldest spot
- Exploits no more! Firefox 26 blocks all Java plugins by default