I wish these [REDACTED] would just stop [REDACTED] with information. Someone should take a [REDACTED] to their [REDACTED] and [REDACTED] really hard until [REDACTED][REDACTED][REDACTED] with a teaspoon.
End of [REDACTED] rant.
Declassified documents from America's Foreign Intelligence Surveillance Court (FISC) shows that even the NSA didn't know the limits of what it was supposed to collect, and overstepped its authorisations for years. The documents were released to the Electronic Privacy Information Centre in response to an FOI request, and record …
They are the content of an HTTP GET request.
They are not addressing data. They also reveal the content of the likely response to the request.
Addressing data is the IP address (clue in the name) and nothing else.
That is, co-incidentally, how it was specified in the 'invalid' EC Data Retention mass surveillence regulations too.
As I have ranted in a previous post, metadata is data about data. That means that you can't define what metadata is without first defining the 'data' that it is about.
You can't just say that metadata is "addressing information" and use that as a test against any data in any situation.
If you want to know what sites someone has visited then 99/100 that is contained in the URLs - the URLs are not metadata in that context; they are the data itself. In most cases, the URL shows what pages you read, what images you viewed, what videos you watched, what files you downloaded, what articles you commented on and what products you browsed in an online shop.
Not always, of course, but most of the time.
Metadata is a subset of your 'real' data and can be generated from the main dataset and this is exactly what happens. It is then used to identify that 'real' data later or used as a statistical tool.
By the US (and Aus) government's definition, a complete GPS route of everywhere you have travelled would fall under 'metadata'. After all, it only records where you went and not what you did - it's just 'addressing data'. Right?
If you want to know where someone has been, GPS information IS the data and if you what to know what websites someone has visted, URL information IS the data. Referring to things as 'addressing' and 'content' is just as misleading and ambiguous as using terms like 'data' and 'metadata', as can be seen from the judge's quote in the last paragraph.
The only way to have an intelligent, informed, discussion on what the government should and should not collect (and how that information should be kept, accessed and audited) is to list exactly what 'data' is being kept, by who, for how long and under what scrutiny it is accessed.
The US government's tactic - just like the Australian government - is to say that it only collects 'metadata' and to foster the idea that 'metadata' is benign by only explaining that term in a broad fashion that obscures (I think deliberately) what is actually collected.
The argument about URLs being data if data is passed on the URL for GET request is weak.
Because technically speaking, HTTP GET request was not designed to be used for posting data or API, it is even written in the specification.
The fact that many (misinformed, or just couldnt' care less) web devs created GET request API that includes content does not make an API URL "content", because if the web devs adhered to the specification properly, they would have created an API (that contains data) through HTTP POST and not GET, which does not post the data as content on the URL but via the BODY of a HTTP request.
At other times, when a shortened / friendly / REST url is involved, it may for example review the title of the news article you're reading (or most likey a partial title), but again, I believe this is a clue to the actual content read, but not "classified" as "content" itself, because no-one reads a URL and automatically knows what anyone accessing that URL has read / posted / done, without either some prior knowledge of the content at the LOCATION, or goes and do a HTTP request to obtain the content.
Even then, the content may change at the URL dynamically depending on all sorts of factors including but not limited to what IP you're accessing with, the UA string, and cookies, meaning technically speaking, there is no way to confirm that one person knows what another saw simply via the URL, they must also be able to confirm that the site in question is static and that the logs on the server end supports the evidence, because on a compromised server one could have easily been fed targeted (by country for example) malware instead, or on a news site, one could've been fed propaganda depending on the incoming IP's perceived location.
So, would those give the AC above a downvote about URL care to continue to enlighten people why you think URL's are content?
Yes it's a shit way of writing things but it happens.
The content is obvious. User jbloggs1975 bought 3 oranges from supermaket.com
What it is not (simply address info): Anonymous person from IP address aaa.bbb.ccc.ddd went to www.supermarket.com and accessed their online shopping section.
First, one AC asking for responses to another AC's questions/points is a bit of a meal but anyway . . .
I am not going to enlighten anyone about why - or even if URLs are 'content'. What I am going to do is use this post and the ones you are responding to as a perfect demonstration of why I said in a previous comment that we shouldn't be talking about 'data' vs 'metadata' and 'content' vs 'addressing information'.
It's a distraction. Evidently.
There is no one objective definition of what 'content' is because it is dependent on what you are referring to. The contrast to 'content' is usually 'presentation', in that you split out which components are which. Sometimes this is elementary; a novel is easily split: the text is content but whether it is a hard/soft-cover or digital copy is firmly 'presentation'. So to is the choice of binding used (if a hard copy), the fonts, the page margins, pagination, etc...
Even there, however, you run into questions that don't have a clear answer. Is the table of contents to be considered 'contents'? What about the index? If it is a non-fiction work, what about the bibliography? Chapter names and numbers? And what about the title? Mixing the terms up a bit, the title of a book could well be considered 'metadata' but I think it would be hard to argue that it would not also form part of the creative work and thus be 'content'.
If we look at a URL, how does that relate? Is it 'content'? It's certainly not mere presentation.
Drawing from the example case, avove, a URL is the equivalent of a book title. In that way, your argument that just reading a URL doesn't provide automatic knowledge of what content was read by the user can be investigated. It is true in the case of a book as well - just knowing the title of a book someone has read does not bestow automatic knowledge of the content read, unless one is already familiar with the book.
The point people are making, however, is that knowing the URL visted allows someone - 99% of the time - to find out what was being accessed, just as knowing the title of a book someone read allows you to find out what words they read.
Of course, you can't prove from a URL that someone read a specific paragraph or looked at a specific picture but accessing the site alone may well be enough, just as having a copy of Mein Kampf (couldn't think of a better example on the spot, sorry) on your bookshel could weigh into an investigation, regardless of whether you have read it or not. After all, how do you prove you haven't read it?
How do you try and convince your wife you 'only buy it for the articles'?
What happens in the family court when your (soon to be divorced) wife brings up your Internet browsing history and shows all the pornography sites you have been visiting as proof you have been neglecting her? She cant prove you were watching a specific video or even that you watched any but it doesn't matter - the fact that you visited the sites is enough for the purpose.
Going back to book titles, let's say that a book's title is not content. What about a book that contains a list of books? Are the titles contained inside that book 'content'? On the surface it's the same thing - a book's title, which doesn't contain the text of the book itself. In this instance, I think most people would consider the list of titles to be content. But what has changed? Simple: the purpose of the book.
An easy example of this is a phone book. My name and phone number might be considered 'metadata' but gathered together in big directory for the express purpose of matching a name to a phone number, that same information must be considered to be the 'content', or, the 'data'.
But the above is really just me making my point in presentaion as much as content - the argument is silly, lengthy and without any resolution. I don't submit the above as anything more than a demonstration of just how pointless it is to get bogged-down arguing about something that, in the end, comes down to how we define a bunch of words that only have any meaning in reference to our subjecive intepretations of rather abstract concepts.
It's doubly-pointless because the government's definition seems to be:
1. Addressing information.
2. Anything else we want to record about our citizens.
pierce writes: "URL's CAN be the address of static content.. but they also can be an API, passing data as POST or GET arguments."
HTTP URLs can include a query-string. Since the HTTP GET method does not include a message-body, the query-string is the usual way to pass parameter data to the server. But the user agent can use a query-string with any method (except possibly CONNECT, since the syntax of that method isn't defined by RFC 2616).
By convention, web browsers processing HTML forms that use the GET method will URL-encode the form field data and append it to the action URL's query string. Parameters for HTML forms using the POST method are sent in the message-body, not in the query string. But the method is irrelevant to the presence of a query-string in the URL, from HTTP's point of view.
Whether the presence of a query-string makes a URL an "API" is debatable. The latter term is not, of course, defined by RFC 2616.
AC writes: "Because technically speaking, HTTP GET request was not designed to be used for posting data or API, it is even written in the specification."
Citation, please. No, don't bother; I'll tell you what RFC 2616 says. It says that the GET method is both "safe" (9.1.1) and "idempotent" (9.1.2) - terms of art in this context. Safe methods SHOULD NOT have user-visible side effects1; idempotent methods can be replayed without additional side effects.
Nowhere does RFC 2616 say that GET cannot "be used for posting data or API". An idempotent method can "post data" (presumably - the term is not defined by 2616) as long as multiple invocations don't have additional side effects. A safe method can "post data" as long as the side effect isn't user-visible.
And whatever "API" might mean in this context, it almost certainly includes operations that are not only allowed for safe methods, but are in fact commonly achieved by them. I use APIs all the time that include operations without side effects.
More broadly, though: the query-string has no special status in this regard. It's intended for passing parameter data, but any part of the URL that's visible to the server (at least the abs_path and query-string, and possibly the entire URL) can be treated as whatever sort of data the server likes. There are conventions for using other parts of the URL as input, for example the use of PATH_INFO in the CGI/1.1 specification. Nothing about query-strings or any HTTP method magically turns an HTTP request into an "API". In isolation, all HTTP requests have the same status; it is the server's interpretation that distinguishes between simple retrieval and operations with other side effects.
1That is, side effects beyond retrieving data; 2616 9.1.1 is less specific than it might be on this point, but it's clear what's intended.
The NSA [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] bastards [REDACTED][REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED] [REDACTED]
"shows that even the NSA didn't know the limits of what it was supposed to collect, and overstepped its authorisations for years."
The only thing that is 100% sure is that NSA "overstepped its authorisations for years". Where does the conclusion that NSA "didn't know the limits" come from? I find it far more likely that they knew teh limits but carried on regardless, knowing full well that the worst possible consequence was being told to stop.
> Declassified documents from America's Foreign Intelligence Surveillance Court (FISC) shows that even the NSA didn't know the limits of what it was supposed to collect, and overstepped its authorisations for years.
Which definition of "collect" are we working with here?
>>When asked, "Does the NSA collect any type of data at all on millions or hundreds of millions of Americans?" [Clapper] replied, "No sir, not wittingly." To him, the definition of "collect" requires that a human look at it. So when the NSA collects—using the dictionary definition of the word—data on hundreds of millions of Americans, it’s not really collecting it, because only computers process it.<<
I'm assuming they overstepped on both, but be nice to know.
It's like that script I wrote at my bank that skims 0.1% off all transactions and puts it in my account. I only steal the money when I spend it.
Still, this was Google's argument in the first place when it claims it doesn't read your mail (because no human looks at it) but simply has a computer scan it to serve up ads relevant to the content of your mail. Unintended consequences of, and for, the "do no evil" company.
"overstepped its authorisations for years"
“the government acknowledges that NSA exceeded the scope of authorised acquisition continuously during the more than [REDACATED] years of acquisition under these orders”.
"The court says NSA's overcollection of metadata was “systematic” over a number of years."
“serious compliance problems that have characterised the government's implementation of prior FISC orders”
"the documents indicate that non-compliance was a frequent problem, with the government notifying the court of NSA breaches both in the over-collection of data and the disclosure of data to other agencies beyond the court's authorisation"
"the NSA managed a trifecta, with the court noting another round of compliance breaches relating to access to metadata"
“Those conducting oversight at NSA failed to do so effectively”
I thought the article was joking about the redaction on the number of years. Evidently not, but I wish they had put some bounds on the value, both for when it could have started and on when it ended (unless it is still continuing). Actually, even if they say it ended, it's probably continuing.
The main problem to me is that they are collecting vast amounts of data just because the light is better there. No, I'm not a terrorist, have no terrorist friends, and would fink on a terrorist if I ever got the chance--but the NSA still knows everyone I've ever called.
Actually, I usually use pay phones for originating calls, so I've defeated the NSA. Not that I have anything to hide. I just have a lousy and expensive calling plan--because I don't make many calls.
Sorry, NSA, I didn't mean to stand where the light was bad and deprive you of so much of my personal data.
A gazillion upvotes - it's the exact question I want to see answered.
That the rules have been breached has been rather abundantly clear for years, but as long as that only results in nothing more than the removal of biscuits in the boardroom you might as well stop pretending that rules matter. Oh, and don't bother with punishing the organisation, that's mere accounting.
Proposal: all those responsible lose any claim to privacy, are moved into glass prisons and are banned from using any encryption for the duration of their punishment, with traffic and communication logs publicly available on, say, Facebook. Just an idea.
Have those responsible within the NSA had to suffer consequences for those breaches? What exactly? Anyone fined? Fired? Imprisoned?
THERE WILL BE NO CONSEQUENCES FOR ANYONE, THEY WERE SIMPLY DOING THEIR JOBS
Do you honestly expect the government to hang its lawbreaking snoops out to dry?
It will NEVER happen.
I think the FISC should appoint Michael Bromwich to be the NSA compliance monitor. His compensation will paid out of their budget. If they are not following compliance, then he can pull the plug on the data collection, destroy the data and prevent future collection until the deficiencies are resolved.
Biting the hand that feeds IT © 1998–2019