back to article Publishing ANYTHING on .uk? From now, Big Library gets copies

On the same day that thousands of public sector bods will go on strike in a row over pay, pensions and working conditions, new regulations will come into force at midnight tonight allowing the British Library to begin scraping content from UK websites. Under the rules - known as legal deposit - the country's biggest collector …

COMMENTS

This topic is closed for new posts.

Page:

In what sense is it being done for YOU when YOU didn't ask for it and doesn't want it or want money to be spent on it?

Government agencies don't exist to do what you want. They exist to do what you need.

There is valid space for discussion on what actually is necessary, but personal whim is irrelevant to this discussion.

2
2
Anonymous Coward

"Government agencies don't exist to do what you want. They exist to do what you need."

I need them to not spend a single penny on breaching website T&Cs. If the website T&Cs say "thou shalt not copy my content", then they can just move on and go collect the next website.

Until the law is clear - and it's as clear as some very mixed-up mud at the moment, the way lawyers like it - about -where- a work is "published", then if this only applies to .uk domains, then surely the Leveson-inspired idiocy of websites being covered by press regulation only applies to .uk domains.

Oh, no, wait, parliament decided that their "press" regulation applies to anything, anywhere on the Net, if "aimed" at UK users (without defining "aimed"), then set some restrictions on the over-arching authority they decided to grant themselves.

1
2
Gold badge

"If the website T&Cs say "thou shalt not copy my content", then they can just move on and go collect the next website."

Does your browser do that? Mine doesn't. Mine ignores any and all such non-executable requests and instead makes a copy of the content for my perusal.

If you want such wishes to be executed, make them executable by not serving up the pages to everyone who passes by. *Many* websites do exactly that, and only dish content to paying customers. I imagine that *those* will not be appearing on the archive.

As another commenter explained, this is a natural extension of existing (and very long-standing) copyright law to a new medium.

2
0
Trollface

I need them to not spend a single penny on breaching website T&Cs.

No, you want them not to spend a single penny on breaching website T&Cs. This is demonstrable by the fact that they already have on at least one occasion, and yet you survived to post ex post facto. Had this point actually been a requirement, you would have expired upon the breach.

1
0
FAIL

"Government agencies don't exist to do what you want. They exist to do what you need."

On whose whim does a government agency decide what 'you' 'need'? Or which 'yous' to take into account? Or what constitutes a 'need?

A government agency only exists as a servant of the 'yous'? Unless it exists by divine providence.

1
0
Boffin

On whose whim does a government agency decide what 'you' 'need'? Or which 'yous' to take into account? Or what constitutes a 'need?

Too long to go into here. Since this is the UK we're talking about, try looking up "constitutional monarchy" in wikipedia.

A government agency only exists as a servant of the 'yous'? Unless it exists by divine providence.

I do believe you've just profoundly illustrated the dichotomy of a constitutional monarchy...

0
3
Facepalm

"On whose whim does a government agency decide what 'you' 'need'? Or which 'yous' to take into account? Or what constitutes a 'need?"

"Too long to go into here. Since this is the UK we're talking about, try looking up "constitutional monarchy" in wikipedia."

Let me help: on the government agency's whim, with or without reference to the 'yous', whose servant it is.

(Not restricted to the UK, constitutional monarchies, orother particular bunch of government agents. Try looking up Animal Farm in Wikipedia, or a reliable reference source of your own choosing, and extend the "some are more equal than others" part to any bunch of self-important, determiners of rules to lord it over others).

See? Not at all too long, really.

1
0
Anonymous Coward

@Sandpit

"Did they think this through?... This is enabled by an act of parliment [sic]..."

OK, so they DIDN'T think it through. Silly us for asking.

0
0
Bronze badge

It is being done so that future historians don't call this the second dark ages.

It may be hard to believe, but historians in 500 years time will be extremely interested in your tweets of what you had for breakfast and so on.

0
0
Pint

Copyright

This could actually help. Having a datestamped copy lodged with the BL will aid in proving ownership.

Beer, because it's almost lunchtime.

0
0

This post has been deleted by its author

IP Ranges

Anyone have their range details so I can block them from my sites?

1
2
Anonymous Coward

Re: IP Ranges

Only manged to find this so far...

http://www.webmasterworld.com/search_engine_spiders/4484756.htm

0
0
Anonymous Coward

Re: IP Ranges

194.66.224.0 to 194.66.239.255 (194.66.224.0/20)

0
0
Silver badge
Facepalm

Minutiae

^ What does it mean?

0
0

practicalities

This is a Good Thing.

However, this is not really the same as the BL getting a copy of every book that is published, because before a book is published it is put in a form that the author and publisher regard as "finished".

Websites are never "finished". (Once upon a time it was de rigeur to have an "under construction" logo on one's site.)

So it would be nice to know more about the practicalities of this, so that the owners of websites who would like to regard their content as permanent can contribute in the most effective way to the national archive.

I couldn't find anything about this in my quick perusal of the relevant webpage.

http://pressandpolicy.bl.uk/Press-Releases/Click-to-save-the-nation-s-digital-memory-61b.aspx

1
0

Re: practicalities

This is not always a Good Thing, for the reasons you have given. They should not be copying websites as of today, and claiming that's the final version of the website, and website owners should be allowed to require them to update content on demand and remove all older versions of works that have been scraped in an older form. Oh, and if they are copying content for which you charge for access (they can demand password access it appears), and allowing other people to read it, unless there is a requirement for them to give you the full name and other details of each person so you can collect the money you are owed, that's just stupid.

There's more info at

http://www.bl.uk/aboutus/legaldeposit/introduction/index.html

Follow the links from there and you find this little gem:

"A publisher must also deliver a copy of any computer programs, tools, manuals and information—such as metadata, login details, and a means of removing individual DRM technical protection measures—that are necessary for using and preserving the publication."

So, if you build your website on top of an Oracle database, you have to give them a copy of Oracle? If you did graphics with Adobe Creative Suite you have to give them that?

Only a complete idiot could have approved these rules, Mr Vazey.

2
1
Flame

Re: practicalities

sorry for replying to myself but digging deeper in to their sh - sorry, site, this may be of interest to some:

http://www.bl.uk/aboutus/legaldeposit/websites/websites/faqswebmaster/index.html

Couple of key bits:

----

We use Heritrix and the crawler’s User Agent should identify itself as ‘bl.uk_lddc_bot’.

--

(b) it is made available to the public by a person and any of that person’s activities relating to the creation or the publication of the work take place within the United Kingdom.”

---

So, you work in the UK and build content for someone in the US, they can claim access to password-protected content hosted in the US, just because you do the work in the UK.

What bunch of idiots wrote these regulations? Names should be attached to this sort of garbage to they can be sued for stupidity.

4
0

Re: practicalities... Smackticalities

Re: practicalities... Smackticalities

Such a demand could spark a mild insurrection. If my .com blogs that already -- thanks to google -- have other countries' .country domain extensions end up with .uk domains, and the GL/GL demands access to my screenplays, databases, scripts, manuscripts, drawings, doodlings, and more, it can KISS my ASS. This is the same rights-violating bullshit foisted on desperate authors by shysters/crooks who claimed to be verifying the true lineage of works presented for publication in the early 2000s. Such companies demanded this info so they could "protect themselves and be indemnified" from law suits in the event the material was in fact stolen or plagiarized.

But, for a government to demand the same, that deserves a smackdown. Thieves are EVERYWHERE, and once a government employee misappropriates content that is then re-misappropriated to his or her business buddies or creditors, you can bet your ass that the employee's government, YOUR government, would whip out a clause or codicil saying it is indemnified from the damages you suffer/suffered.

Not, the government should only demand the author signs or attests that he/she is the true, correct, creator or owner or authorized distributor.

I suppose I'd have a HELLUVA frackin' hard time living in the UK if I started encountering laws and precepts and proclamations that laid non-worked-for claim over *my* inventions or ideas. How long before people say, "Theft of my work by ANYONE or ANY AGENCY will result in unpredictable behaivor on my part"?

OK, maybe I'm overreacting. But, it sure is nice to be under the current copright system where I am -- wait -- that is changing, too... Ohhh nohsssss...

0
1
Silver badge
WTF?

Re: practicalities

"(b) it is made available to the public by a person and any of that person’s activities relating to the creation or the publication of the work take place within the United Kingdom.”"

Interesting. It doesn't explicitly state that it will archive .uk domains, rather domains in the UK (which they refer to as "UK published material"). This should in theory exclude .uk domains hosted elsewhere, but may bring in .com and .org and .net and anything else "within the UK".

My password protection is for development/test software, screenplays I'm letting selected friends test-drive, and other stuff that I want accessible to specific people, but not public. They are not getting copies. Just because it exists on a computer doesn't mean it has been "published"; some of the things have been read by three people on the entire planet (and I'm one of those people). If they are going to take that approach, then everything that exists with words in it (even private stuff, doctors reports and such) can be considered a "publication" and will need to be collected and archived. Do you still think this is sane?

Continue to downvote if you want, but this whole concept IS ill-conceived and retarded.

3
0
Silver badge

Re: practicalities

Continue to downvote if you want, but this whole concept IS ill-conceived and retarded."

Some points.

1. The Patriot Act. THe USA thinks that gives it the right to access any and all data held anywhere inside the USA or anywhere outside the USA if they can contrive a link to the USA or any USA based corporate entity.

2. The archives will be "stored" in a library in the same way as all dead tree publications are (supposed to be), eg The British Library or, in the case of the USA, the Library of Congress (I think).

3. To access those archives, you need to physically visit the building (or maybe they'll do this via their website) to request access to specific "sites" in the archive. This is not some WayBackMachine that world+dog can just trawl for any old bits of info.

4. Anything behind a password access system is not "published". That's the equivalent of printing leaflets/pamphlets for a private members club.

1
1

Re: practicalities

In law, anything made available [i]is[/i] published. In the old days you'd see an advert in the back of the local paper "Secrets of Reincarnation. Send 29p to PO Box blah, London N1 blah". It doesn't matter that you got back a handwritten badly-photocopied sheet, that's a publication. Same goes for something on a random website. Doesn't matter that three people have asked for it, it's 'made available to the public'.

If it's password protected, that's not a publication. It's not made available to the public, it's made available to your Aunty Joan only. Same goes for an internal document. It may be a memo from Bills Gates to a hundred thousand minions, but it's not made available to the public and thus is not a publication.

A grey area is hidden links. I can put a private document on my website and tell only you the URL. That's not a public document. But if your email is hacked and the URL is leaked so that crawlers pick it up, arguably that becomes a publication.

1
0
Anonymous Coward

Re: practicalities

"What bunch of idiots wrote these regulations? Names should be attached to this sort of garbage to they can be sued for stupidity".

No need - start with the Cabinet and move on down to the civil servants who attempt to put their ridiculous ideas into practice.

It doesn't occur to a politician that there might be any more to his superficial knee-jerk vote-grabbing reactions than he thinks of. That makes it easier to blame the people who have the thankless/impossible job of "delivery".

0
0
Anonymous Coward

Re: practicalities

"hey are not getting copies. Just because it exists on a computer doesn't mean it has been "published"; some of the things have been read by three people on the entire planet (and I'm one of those people). If they are going to take that approach, then everything that exists with words in it (even private stuff, doctors reports and such) can be considered a "publication" and will need to be collected and archived".

The people who are proposing this ridiculous, half-baked nonsense almost certainly agree that Bradley Manning and Julian Assange committed dreadful crimes and must be severely punished.

Irony? They haven't even heard of it - and if they did they would think it was an industrial process.

0
0
Bronze badge

Re: practicalities

But if, like The Times for example, they offer passwords to access the website in exchange for a monthly payment, then the work has been published, and the British Library can ask for a free account.

0
0
Pirate

Out comes the pragma/cache-control, copyright, doc-class, doc-rights and other applicable meta tags!

0
0

they've reserved the right to ignore things they don't like

0
0
Silver badge
Coat

Free backup service?

Publish all your important documents on a .UK site, and the BL keeps a backup for you. Encrypt it if you don't want everyone to read it.

1
0
Black Helicopters

robots.txt job done surely?

http://www.bl.uk/aboutus/legaldeposit/websites/websites/faqswebmaster/index.html

User-agent: bl.uk_lddc_bot

Disallow: /

Watch them moan and start ignoring it....

What's that sound?

0
0
Silver badge

i fail to see how storing 1000s of internet shopping website on UK domains in the library is going to be benificial to the public in 50 or 100 years time. Take ebay for example just indexing the UK site once would fill up gigs of data which a week later would all be out of date as new products would be added.

1
0
Joke

The public in 2113 will be able to see that 2013 culture was based around buying huge volumes of tat?

"Miss, how much tat did they buy? Well Timmy, so much that even storing a record of all the tat for sale was very difficult with their primitive spinning rust storage technology"

1
0
Anonymous Coward

.htaccess

deny from 194.66.224.0/20

I caught this IP range ignoring robots.txt ages ago, so I just blocked them by .htaccess at first, then firewalled the CIDR

inetnum: 194.66.224.0 - 194.66.239.255

netname: BLIB-2

descr: British Library

country: GB

admin-c: BT544-RIPE

tech-c: BT544-RIPE

remarks: rev-srv: dns1.bl.uk

remarks: rev-srv: dns2.bl.uk

remarks: rev-srv: ns2.ja.net

remarks: rev-srv: ns0.ulcc.ac.uk

status: ASSIGNED PA

mnt-by: JANET-HOSTMASTER

source: RIPE # Filtered

remarks: rev-srv attribute deprecated by RIPE NCC on 02/09/2009

0
0
Silver badge
Thumb Up

But more importantly

El Reg comment tard wisdom will be retained for maybe 200 years.

History preserved.

0
0

the uk web archive is here:

www.webarchive.org.uk/

previously to get your site added, you had to fill in a form to give permission.

I would think they would honour a removal request.

archive.org seems to not be really crawling regularly or very much,

robots directives (robots.txt, http header, html ) will stop webcrawlers.

It would be a nice feature of uk webarchive for the website operator that you could download the WARC (or other standard archive format) of the website snapshot.

0
0
Anonymous Coward

will they be archiving all the .uk pr0n sites?

Then they'd have to censor their own archive.

0
0
Bronze badge

Re: will they be archiving all the .uk pr0n sites?

They already have a huge porn collection. Yes, they will definitely archive them.

0
0

Page:

This topic is closed for new posts.

Forums

Biting the hand that feeds IT © 1998–2017