back to article Publishing ANYTHING on .uk? From now, Big Library gets copies

On the same day that thousands of public sector bods will go on strike in a row over pay, pensions and working conditions, new regulations will come into force at midnight tonight allowing the British Library to begin scraping content from UK websites. Under the rules - known as legal deposit - the country's biggest collector …

COMMENTS

This topic is closed for new posts.

Page:

  1. Vimes

    If we're talking about just another bot like GoogleBot going out there to look at what it can find, then couldn't they be blocked via something like the .htaccess file?

    1. Vimes

      Also: I hope they don't plan on using cloud based services like the offerings provided by the likes of Amazon to do the scraping and rely on their own servers.

      I've already got Amazon on the naughty step thanks to hacking attempts coming from them, and the only other thing that would distinguish the legitimate scraper from the rest in the access log would be the user agent string - and this can be easily forged.

    2. itzman

      just find their ip address and block it...

      is all you have to do...

  2. g e
    WTF?

    Errrr presuambly these are excluded

    google.co.uk

    ebay.co.uk

    amazon.co.uk

    Or they're going to need more than that £3M in hard disks

    1. Vimes

      Re: Errrr presuambly these are excluded

      Somebody has been paying too much attention to certain ideas put out as April fools jokes...

      http://gizmodo.com/5777429/the-entire-internet-on-a-floppy-disk

      Also found this when looking for a link like the one above:

      http://www.w3schools.com/downloadwww.htm

      :)

  3. Z-Eden
    Meh

    A crawl of every .uk domain? Hmm, wonder what seedy or ancient sites that will dig up...

    1. Cucumber C Face
      Childcatcher

      Research is no defence...

      One assumes they'll catalogue any terrorist manuals (and worse) out there on .uk

      I assume the legal framework must give them specific immunity from the default assumption that copying to local storage='making'

  4. Anonymous Coward
    WTF?

    What about our copyrights?

    "Under the rules - known as legal deposit - the country's biggest collector of publications produced in the UK and Ireland will start harvesting what it described as "ephemeral materials like websites" to ensure that the content is "preserved forever"".

    Yet what if I published something (put online) which I don't want to be preserved (yet) ?

    And let's ignore the obvious "I own copyright on my work" issue but what about situations where I pre-publish stuff to appeal to the visitors while I'm still working on it? I'm doing that a lot with several tutorials I write (I'm passionate about sound synthesis & design and maintain my own hobby website) and as long as a version hasn't reached v1.0 status I wouldn't want to see it getting included with some big collection of stuff. Simply because some things could easily change, sometimes quite drastically.

    Another issue; although its very easy to point at Google many people forget that in contrast to popular belief something which gets slurped by Google can be removed again. And it's quite easy too, the keywords being webmaster tools. As others above already pointed out; you can even prevent Google from indexing your site (or parts of it).

    So what do these guys provide? Or are we now down to "We're the government, we decide, the end justifies the means, it's all for the common good, stop whining." ?

    And some people still wonder why so many are losing faith rapidly when it comes to governments in combination with IT.

    1. Pen-y-gors

      Re: What about our copyrights?

      Copyright isn't an issue, that's the whole point of copyright deposit - publishers of books are required to give a copy to each of the six copyright deposit libraries (if they want one) - it doesn't affect the authors' copyright. Ditto with online material - the copyright is unchanged, but they are now allowed to make a copy for archival purposes whether you want them to or not. If you want to keep it secret don't publish it on the web for everyone to see and download.

      This greatly simplifies things - the National Library of wales started to archive a number of 'important' Welsh websites several years ago, but had to contact the site owners of each one and get their permission in advance, and, if I remember rightly, the copies are not accessible outside the Library network - it's an archive for long term preservation, not a mirror site.

      1. Anonymous Coward
        Anonymous Coward

        Re: What about our copyrights?

        Copyright isn't an issue

        but are they just slurping text and ebook files? or are they taking music, movies, images and everything?

        if you have a website where you've purchased the rights to display photographs (i.e. a celebrity fansite), the license only exists to your own site. Are the British Library purchases a blanket license from the likes of GettyImages?

        what if you've put up a mp3 of your favourite music on your site? it's not worth the music industry targeting you for your single infringement, but after this exercise, the British Library could be liable for millions of copyright infringements for non-book material.

        1. David Dawson

          Re: What about our copyrights?

          Copyright is a legally granted monopoly given to the creator of a work.

          Its not something that naturally exists, its a collection of laws passed by HM Government.

          So, if the Government of the day chooses to alter how copyright is assigned to allow the British Library to scrape the UK portion of the internet, it is perfectly legal for it to do that, as it created the entire concept of copyright in UK law in the first place.

          1. Vimes

            Re: What about our copyrights?

            But where laws are concerned: whose laws take priority when more often than not sites cross national boundaries?

            Take for example:

            http://amazon.co.uk.ipaddress.com/

            Hostname: amazon.co.uk, IP Address: 176.32.108.186, Organization: PROD DUB, ISP: Amazon Data Services Ireland Ltd, City: -, Country: Netherlands

            As for the registrant's address for Amazon.co.uk:

            65 boulevard G-D. Charlotte, Luxembourg City, Luxembourg, LU-1311, Luxembourg

            1. Ken Hagan Gold badge

              Re: What about our copyrights?

              I expect the UK government's attitude would be that (since the .uk namespace belongs to them, even if they do delegate its management to Nominet) if you publish under a .uk address, you are putting the material (and perhaps yourself) under UK law. If you are re-publishing stuff which you don't have the right to put under UK law, that would be a matter between you and the owner of the stuff you are re-publishing.

          2. Anonymous Coward
            Anonymous Coward

            Re: What about our copyrights?

            off-topic @Daivid Dawson: what kind of answer is that? It's ok for the government to take things away since they created it?

            If one day the UK is to be hit by a meteorite, and the UK government decided to suspend all telecommunications, air and cross-channel traffic to prevent panicks and to only allow the "privileged" to safely escape the country, according to your reasoning, it's ok to do that since they created much of what modern society is made up of.

            I didn't realise we're still a bunch of serfs under the feudal system.

            1. Anonymous Coward
              Anonymous Coward

              Re: What about our copyrights?

              "I didn't realise we're still a bunch of serfs under the feudal system".

              Most people don't realise that. Congratulations on waking up and noticing reality. (Matrix, anyone?)

              Consider. As Walter Bagehot said, Parliament can do anything except change a man into a woman. (And with modern technology I'm not sure that restriction applies any more). Parliament is an assembly of our elected representatives, which therefore expresses our collective will - right? Wrong. Parliament is an assembly of self-seeking jobsworths a majority of whom do exactly what the Prime Minister tells them to - if they want to go on enjoying their cushy lifestyle.

              So far, we have established that David Cameron can do anything he wants, with the possible exception of sex changes. There is no essential difference between his power and that of a medieval king such as, perhaps, William the Conqueror or John. So yes, actually, we are serfs - except that serfs had more concrete and enforceable rights than we do. And didn't have to pay as much tax. (See, for example, https://sites.google.com/site/stevenburgauer/essay03).

              1. itzman

                Re: What about our copyrights?

                I'll believe in this relationship when Cameron dies from a surfeit of peaches and cider and Miliband is found drowned in the Butt of Ramsey. Or wherever he has been brownnosing this week.

                1. Anonymous Coward
                  Anonymous Coward

                  Re: What about our copyrights?

                  "I'll believe in this relationship when Cameron dies from a surfeit of peaches and cider and Miliband is found drowned in the Butt of Ramsey. Or wherever he has been brownnosing this week".

                  Somewhere, Messrs Sellers and Yeatman are laughing heartily. They must have come up with some jokes the publishers wouldn't print.

                  But there is something in it. After all, wasn't King Gordon faced down by the banking barons?

          3. Anonymous Coward
            Anonymous Coward

            Re: What about our copyrights?

            "Its not something that naturally exists, its a collection of laws passed by HM Government".

            Er, and other governments. The UK government can pass laws overriding the copyright it grants, but not that granted by the USA, France, Germany, China...

            1. David Dawson

              Re: What about our copyrights?

              "off-topic @Daivid Dawson: what kind of answer is that? It's ok for the government to take things away since they created it?

              If one day the UK is to be hit by a meteorite, and the UK government decided to suspend all telecommunications, air and cross-channel traffic to prevent panicks and to only allow the "privileged" to safely escape the country, according to your reasoning, it's ok to do that since they created much of what modern society is made up of.

              I didn't realise we're still a bunch of serfs under the feudal system."

              -----------

              In this country, Parliament is sovereign, so yes, if the government chose to do that, then that would be legal, which is a different thing to 'ok'. Legal and moral/ ethical are separate concepts I'm afraid.

              Sorry you had to find out this way. I wish they would teach this kind of thing in school.

              "Er, and other governments. The UK government can pass laws overriding the copyright it grants, but not that granted by the USA, France, Germany, China..."

              --------------

              Only so far as the law in this country respects those other countries laws. Which is what sovereign means. This is an important distinction! The UK has signed up to copyright treaties, so I imagine they would be respected...

            2. Anonymous Coward
              Anonymous Coward

              Re: What about our copyrights?

              <quote> The UK government can pass laws overriding the copyright it grants, but not that granted by the USA, France, Germany, China...</quote>

              Yes it can, for the same reasons that other countries like China get to ignore OUR copyright laws, we may end up violating some treaty but at the end of the day? trade sanctions from the USA? no one cares about their entertainment being banned anyway.

    2. Gav
      Holmes

      Re: What about our copyrights?

      It's not rocket science people. If you do not want people to take a copy of your website content then do not put it on a publicly accessible website. It's how browsers work. They have caches, they take copies.

      Copyright has nothing to do with it, as no-where is it said that the British Library will be re-publishing your website. It has a copy. It will let others see that copy, just like it already does for millions of books.

    3. Anonymous Coward
      Anonymous Coward

      Re: What about our copyrights?

      Agree entirely. What cheek!

      I have no objection to services like Archive.org, which keep a record of valuable sites, and which any of us can access. But I object to this one. Why? Because the "archive" is purely for the benefit of staff at the British Library (and, yes, the relative handful of people who can physically walk there, if the staff decide to let them use it too). I don't put stuff on the web for the benefit of British Library staff; I do so for everyone.

      As ever with the British Library, "one for us, and none for you".

    4. itzman
      Black Helicopters

      Re: What about our copyrights?

      "Or are we now down to "We're the government, we decide, the end justifies the means, it's all for the common good, stop whining." ?

      essentially yes.

  5. taxman
    Angel

    Legal deposit arrangements remain vitally important

    Does that apply to Assange also?

    Just asking.

  6. The Axe
    WTF?

    Wayback machine

    Typical. An existing private service exists, but the government thinks it can do better and copies it. Using tax payers money in the process. What a waste.

    1. Pen-y-gors
      Thumb Down

      Re: Wayback machine

      Yep, an existing private service that can be switched off tomorrow at the whim of the owner. That's not what I call secure long-term archiving and preservation. It's important for people in 200 years to have access to the the day-to-day publications of the 21st century - will Wayback machine still be online in 5 years, 10 years, 20 years, 50 years?

      1. Anonymous Coward
        Anonymous Coward

        Re: Wayback machine

        "It's important for people in 200 years to have access to the the day-to-day publications of the 21st century..."

        If you believe for a single moment that Web sites scraped by copyright libraries and stored with today's technology will be legible in 200 years, I have $1 trillion worth of hybrid HD/SSDs to sell you.

  7. Captain Scarlet Silver badge

    Waybackmachine

    I thought the waybackmachine had this pretty much covered anyway, apart from maybe ebooks it sounds the same to me.

  8. Tom7

    How far does this go?

    I'm curious, though not curious enough to go look it up. Does this allow them to scrape your content, or does it force you to allow them to scrape it? What I'm getting at is, what if I detect the British Library robot and send it off to some obscure error page to prevent them archiving my site? Has that just become illegal? Or does it just indemnify the libraries from copyright claims if they happen to get to my content?

    1. itzman

      Re: How far does this go?

      Oh just invoke the principle of plausible deniability and claim it as a fat finger by a sysadmin.

  9. Kubla Cant
    Headmaster

    Beano

    Although these libraries are entitled to receive and keep a copy of all copyright publications, I don't think they necessarily do so. I seem to recall that the Bodleian failed to produce back numbers of the Beano to help while away the long hours that should have been spent writing essays.

    In the case of web content, the fact that a large and increasing proportion is produced dynamically must make things difficult. For many sites there's no such thing as a definitive copy, so there's nothing to keep.

    And can anybody explain why an alien university such as Trinity College Dublin should benefit from the free handout of books?

    1. Anonymous Coward
      Anonymous Coward

      Re: Beano

      Although these libraries are entitled to receive and keep a copy of all copyright publications, I don't think they necessarily do so.

      However, many years ago I seem to recall reading an amusing report about a school somewhere in England that had discovered a dusty old tome in its library and after a bit of research came to the conclusion they had the only copy of this book in existence. They made a big deal of it with press releases etc ... and were then surprised when they got a letter from one of the copyright libraries saying that as they were entitled to a copy of the book and as this seemed to be the only copy available then they'd be sending someone to collect it!

      1. Anonymous Coward
        Anonymous Coward

        Re: Beano

        "...they got a letter from one of the copyright libraries saying that as they were entitled to a copy of the book and as this seemed to be the only copy available then they'd be sending someone to collect it!"

        Think of all the fun they could have had by informing the other copyright libraries of the situation, and then watching them fight it out.

    2. Ken Hagan Gold badge

      Re: Beano

      "And can anybody explain why an alien university such as Trinity College Dublin should benefit from the free handout of books?"

      Oh, that's easy. The Republic of Ireland was created by a British government that expected them to come back one day.

      1. Yet Another Anonymous coward Silver badge

        Re: Beano

        Just think of Eire as an offsite backup of the UK

    3. jonathanb Silver badge

      Re: Beano

      Because our copyright libraries benefit for free handouts of Irish books under their copyright law.

  10. heyrick Silver badge

    Thoughts

    My stuff is released under a sort of licence. Essentially it is reminding you of copyright, but it also expressly forbids the content being served by a third party system while my website is still "live" (when I'm gone, it's no longer my problem). Secondly it prohibits in any case the modification of content for any purpose other than translation (especially the practice of detecting keywords and linking them to adverts). Those are the terms of distribution, Accept them or piss off, basically.

    Secondly, given that recently a person was guilty of libel for retweeting a lie; I presume if somebody libels on their site and this turns up in the copy, the British Library will also be equally liable.

    Thirdly, any terms and conditions imposed by the library will be groundless; they want to come get our content and copy it, so good luck making a disclaimer stick...

    Fourthly, I assume it will obey robots.txt; if not it'll get blocked by IP on principle (or maybe I'll redirect them to their own website?).

    Did they think this through?

    1. Anonymous Coward
      Anonymous Coward

      Re: Thoughts

      " Secondly it prohibits in any case the modification of content for any purpose other than translation (especially the practice of detecting keywords and linking them to adverts). Those are the terms of distribution, Accept them or piss off, basically."

      You don't understand copyright. You have been *given* the right over copies on the condition that the BL is allowed to store the material regardless of what you want/like. If you don't like it, your choices are to take it off line or release it as public domain.

      This is not new; it's just a clarification of existing law.

      1. dssf

        Re: Thoughts THEREin lies the problem

        Well, for some people. OK, so I'm trying to "deconstruct" , this to understand it...

        Does a government think that it "created the concept of copyright"? Why cannot this be just a mere recognition that inventiveness deserves some protection?

        If a author writes a story, publishes it, and people pay for it (assuming it is that good), then, if government wants a copy, does it then get in line and PAY for a copy just like anyone else paying for a legit copy? (Doesn't the author have to pay just to obtain recognition of the copyright?) Otherwise, TAKING a copy could be seen as tantamount to theft. (If I just said "arrrestable words", then, kindly remind me not to dare step foot in the UK or on soil from where I can be extradicted to the UK...) I could see huge problems in the future if the UK library law were allowed to perpetuate on distant human colonies. Colonists might throw an insurrection unless it is the PEOPLE who collectively say that it is an okay thing. I am not saying *hide* or deny the preservation of published materials of worth or note, but that each published work preserved by a library should first be done so with the permission of the author or rightful rights holder. Government doesn't publish fiction, cooking guides, comics, porn, or love stories. So, it doesn't deserve to "own" the copyright in those works. Fortunately, it seems, things are different in the states. Well, to an extent. Here, it's getting to the point where the public may end up paying to access court documents and "public records".

        Would it be unreasonable to hear someone say "The real fact behind such a proclamation is that it allows powerful men to shut down and jail/imprison those it deams a threat"? If government "grants" rather than "recognizes" copyright of an other's works, then it means government can shut down a voice it doesn't want heard. Copyright may be a "human construct", but it should not be a right for any damned government to think it can just take and shut down works.

        Yes, I get it, the story is not about copyright and government profiting financially.

        BTW, do I understand that in the UK, if a person in the UK publishes a work, and makes only ONE physical copy, and the government library system wants a copy, it can *demand* a copy? What if the author says, "You must pay me for time, materials, and labor", and marks it up above street price? Would that be legal? Can the English/UK library system demand the author provide a free copy? I am assuming that an author or publisher or copyright owner must pay at the government toll gate to initially get that piece of paper stating "you're the proud owner of this government-issued/authorizied/revokable copyright"...

        1. Steve Knox

          Re: Thoughts THEREin lies the problem

          Does a government think that it "created the concept of copyright"?

          Yes. See http://en.wikipedia.org/wiki/Copyright#History.

        2. Anonymous Coward
          Anonymous Coward

          Re: Thoughts THEREin lies the problem

          "If a author writes a story, publishes it, and people pay for it (assuming it is that good), then, if government wants a copy, does it then get in line and PAY for a copy just like anyone else paying for a legit copy?"

          For the same reason that, if a government wants £100 billion to give to corrupt banksters or to render some remote country uninhabitable, it doesn't tell its ministers to roll up their sleeves and do some honest work to earn the money. It just passes a law compelling the rest of us to give it the money.

          I'm always amused by people who maintain that "violence never solves anything" or that we live in an essentially peaceful society. Everything government does is founded solidly on an indispensable foundation of almost unlimited violence. If I don't wish to pay taxes, they will take them out of my bank account. If I withdraw the money and hide it under my bed, they will send policemen to take it from there. If I resist, the policemen will arrest me. If I won't let them, they will threaten me with weapons. If I defend myself with a weapon (using an appropriate level of force) they will, at some point, kill me - or wound me severely enough to stop me resisting. Then (if I survive) they will send me to prison and keep me there by more violence.

          That is how government works. But don't take my word for it. Would you believe the first president of the freest, most democratic, most wonderful nation the world has ever seen?

          "Government is not reason. It is not eloquence. It is a force, like fire: a dangerous servant and a terrible master".

          - George Washington

        3. Anonymous Coward
          Anonymous Coward

          Re: Thoughts THEREin lies the problem

          "If a author writes a story, publishes it, and people pay for it (assuming it is that good), then, if government wants a copy, does it then get in line and PAY for a copy just like anyone else paying for a legit copy? (Doesn't the author have to pay just to obtain recognition of the copyright?)"

          The government does pay, in the form of the legal protection. And, no, the author doesn't have to pay for copyright in the UK.

      2. heyrick Silver badge

        Re: Thoughts

        "You don't understand copyright. You have been *given* the right over copies on the condition that the BL is allowed to store the material regardless of what you want/like."

        In other words, "let's tweak a law so we can record a copy of everything without landing in trouble".

        Consider this: The acts restricted by copyright in a work. (1)The owner of the copyright in a work has, in accordance with the following provisions of this Chapter, the exclusive right to do the following acts in the United Kingdom— (a)to copy the work (see section 17); (b)to issue copies of the work to the public (see section 18); [F44(ba)to rent or lend the work to the public (see section 18A);] (c)to perform, show or play the work in public (see section 19); [F45(d)to communicate the work to the public (see section 20);] (e)to make an adaptation of the work or do any of the above in relation to an adaptation (see section 21); and those acts are referred to in this Part as the “acts restricted by the copyright”.

        (2)Copyright in a work is infringed by a person who without the licence of the copyright owner does, or authorises another to do, any of the acts restricted by the copyright.

        In other words, I as the author of something have the right to provide it, or not. On the terms of my choosing. This is part of national and international agreements that just can't be arbitrarily modified. So, no, I do not believe that I have been given the "right" on the condition that the BL copies everything regardless. [further complication: my material originates in France and is uploaded to a .co.uk domain hosted in the United States... <grin>]

        Actually, I don't care if they copy, it is the (re)serving that I don't appreciate.

        The above, by the way, is from the Copyrights, Designs, and Patents Act 1988 - read it here (part 1, chapter 1, paragraph 16).

        .

        Now, you may believe that this is a big storm in a very small teacup (yes, it is), however it introduces interesting precedent. Either a government institution can copy and then publicly reproduce copyrighted content without giving a damn about said copyrights; or the government is quite willing to modify copyright laws to allow the above to happen. Funny, citizen copies something they shouldn't (as a copyright infringement), it's a whole different story...

      3. itzman

        Re: Thoughts

        I wonder how long before they allow them to translate them into poltically correct NewSpeak by a coprocessor running a 64 bit port of Doublethink.

  11. Anonymous Coward
    Anonymous Coward

    Dynamic content

    Many web pages these days are interactive. Are they just going to scrape a page's static image - or are they expecting to get all the code as well? Presenting that interactive content intelligently after 10 years is going to need a lot of browsers archiving as well.

  12. Anonymous Coward
    Anonymous Coward

    Frequency

    How often are they going to trawl a site? Presumably they will want each update as a separate archive. Presumably Google merely update to the latest copy of a page.

    It would be interesting to know what intelligent limits they are going to place on this. Keeping "everything" is not an option.

  13. Alex Gollner
    Alert

    UK-created websites in the non-.uk domain

    Should UK people/organisations with non .uk domain addresses be allowed to submit their sites for archiving?

    1. dssf

      Re: UK-created websites in the non-.uk domain-- or blogpot.uk?

      Re: UK-created websites in the non-.uk domain-- or blogpot.uk?

      If your audience is in a certain country, and you use Blogger/Blogspot, and people in that country start accessing your content, even if it starts out as .com (say, in the USA), then that country's domain will be on a blogger/blogspot page of your audience. Google says this is to speed up access to the content. I don't completely buy it. If the site has a very small amount of dynamic content, and if most of it is text and small images, then probably javascript crap and browser settings conflicts will slow the page loading down more than it being. 6,780 miles from the reader.

      However, IIRC, you an ask Google/Blogger/Blogspot to disallow per-country domain appending or whatever it is they call it.

      Also, you can set the page to be readable only by invited or white-listed people or email addresses. So, even if you ARE in the UK, if you publish content under TOCs that state the readers are special, private, invited, non-public guests, then that might legally be enough to disallow a grab-bagging/copyright-vacuuming library system from scarfing up content its author intents to keep in limited, private circulation. Of course, it would get nasty if a TOCs-violating subscriber/invitee just screen scrapes and then republishes the content on a legit .uk page that is harveste-- umm, archived before a takedown notice could be issued.

      Also, if an author laces his or her content with ungainly working making it offensive to the public and not satisfactory to introduce to schools, then would the government redact/black out such words if the remaining content is somehow worthy of archiving and representation to the public? How non-sensical would an author need to become to virtually guarantee the UK copyright czars back off? (IIRC, UK parody, libel, and freedom of speech concepts are different than in the USA and some other countries...)

  14. Sandpit

    @heyrick

    "Did they think this through?"

    Did you think this through? This is enabled by an act of parliment. It is protected by law. look up legal deposit legislation, it's been a round for a long time, this new provision extends that to non-print, something that was granted in 2003 but has only just gone through today 10 years later.

    Yes, this has been thought through, a lot!

    Andy why is this being done at all? It's for YOU, to make everything that is published (in whataever form) available to the public and forever.

    1. ForMe
      WTF?

      "Andy why is this being done at all? It's for YOU, to make everything that is published (in whataever form) available to the public and forever."

      In what sense is it being done for YOU when YOU didn't ask for it and doesn't want it or want money to be spent on it?

      Just trying to understand.

Page:

This topic is closed for new posts.