Big Data is “bullshit”, says Harper Reed, Chief Technology Officer at Obama for America 2012. Speaking at the CeBIT conference in Sydney, Australia, today, Reed said he encountered the the term “Big Data” in 2007 when it referred to a storage problem. “We used it in 2007 because it was hard to store data,” he said. “People who …
First of all ...
... "big data" is the NSA. No others need apply.
Second of all, those of us over twice the age of these children, that's the people who actually built this thing now known as "The Internet", actually grok the privacy issues involved in the research network invented to research networking. It wasn't designed to hide data, rather it was designed to share data. And it does that admirably.
It was NOT designed with anything resembling hiding the sharing of individual human statistics for marketing purposes, and as a direct result, those stats are available to all & sundry. And big-business is using those stats to make money off the backs of individuals.
Worse than Unions, really, when you think about it ... the fucking brain-dead sheeple don't even realize that they are making money for multi-billion dollar international corporate conglomerates, without getting so much as a nickle of the profits!
What are these fucking morons thinking, anyway?
"Oh, goody! I can see a video of a sneezing cat on youtube! I'll send all my friends a gmail
spamlink to it! And then I'll post all the videos I took at a family wedding, with appropriate captions (including names, addresses, phone numbers and email addresses, natch'!) on facebook! But google, a MARKETING company that it is, would never try to correlate the data! Not ever!"
The above actually happened to a friend, BTW. Many of the receivers of the "cute cat" email last Wednesday are regularly getting advertising for "cute cats" and wedding related shit on google related Web pages today. The wedding was on Saturday.
The mind absolutely boggles ... WHY THE FUCK do people trust google?
Re: First of all ...
Oh noes! Google is going to show ads at me!
Re: First of all ...
Cool story, bro.
It's not that people don't know, it's that they don't care. And why would they? Most people see a few targeted ads as a perfectly adequate price for using google/youtube/etc.
Re: First of all ...
It's not google showing the ads that bothers me (I block all of google as a matter of course) ... it's the fact that google uses info supplied by parties not google or the advertiser to figure out who to spit an ad out to, not just a third party, but a fourth party.
My friend & family in the above post didn't ask for cute cats & wedding adverts. Nor did they ask (or allow!) the silly little girl who supplied the info to google to provide the data to google. But they are getting spammed anyway.
And google can cross-reference all the physical address & telephone data that the silly little girl provided to google with further use of google (maps, whatever) via IP address.
This isn't paranoia, this is recognizing a major security flaw in TCP/IP that google et ali are using to make a quick buck from before the .gov realizes how bad the personal privacy situation has become.
@as2003 (was: Re: First of all ...)
The trouble is that I never have used google/youtube. Ever.
But google has a database that includes me, and my personal details.
That they sell to other people.
You actually find this a valid state of affairs? Why?
Does anyone even understand the concept of "theft by conversion" anymore?
Re: First of all ... @jake
"This isn't paranoia, this is recognizing a major security flaw in TCP/IP that google et ali are using to make a quick buck from before the .gov realizes how bad the personal privacy situation has become."
To be honest your gripe here should be with the girl who gave them the information, by the sound of it. And in this case the "theft by conversion" charge would seem to lie in her lap as well. Could you explain the flaw in TCP/IP bit, please?
Re: @as2003 (was: First of all ...)
@jake. I don't follow your theft by conversion argument. You seem to be blaming Google but theft by conversion is theft after the fact and the thief is the girl, not Google. The best you could get Google on is receipt of stolen goods and to do that you'd need to show that something (presumably your privacy) was stolen. Now you might be able to swing that if you were intensely private but you aren't. Au contraire, most of your posts here include quite a bit of gratuitous personal information.
So the upshot is that your outrage is misplaced. There's certainly no tort here but if it really bothers you, why not simply write to Google and ask for the info to be removed ?
@Jake Re: First of all ...
You do realize that Google and Facebook capture a lot more data in a day that most of the fortune 500 companies manage in a year? Yeah, its that much.
I'm afraid you live in a little fairytale of a world if you don't think Google or Facebook won't correlate the data in different ways than you imagine. They will if they are not already doing it. Its not just about serving up ads, but figuring out how to monetize the data.
You and your company using gmail? Guess what... your mail is probably being mined. If not by human hands, but by machines.
You log in to your favourite paper using your FaceBook credentials for authentication.... Now Facebook knows what articles you've read and more about your interests.
There's more, but I doubt you could handle the truth...
Re: First of all ...
One reason I don't use smart devices that can't run Firefox with Adblock-plus. I *know* I can't stop their systems correlating everything they know about my browsing and targeting me with laughably useless advertizing. But at least I never have to look at it!
Good advice to all. ALWAYS be a buyer, NEVER become a sellee.
Someone who is using simple data points like 'do you support the president' is NOT talking about big data - even if using a lot of these simple data points about a lot of people, you should be using a traditional relational database. Big Data is when you try to deduce from billions of unstructured tweets or facebook messages how many people do support the president, without asking it explicitly.
Which of course you can't do reliably and is why the Emperor is not wearing any clothes.
Ah, so we are trying to predict the future without trying to predict the future. I knew I was missing something.
I can see how it applies to search - whether Google et al. or the spooks - but pretty much most companies could do better by simply asking their customers and observing how their customers interact with offered products and services. This type of data is far too nuanced for a computer to cough up, but happens to be the type of thing our feeble minds do well (when we get our heads out of our asses).
The first thing I notice any time I see a large database is how poorly the tables are laid out. I suspect big data is mostly an attempt to do database type operations in application layer due to lack of understanding data flows.
Big Data is when you try to deduce from billions of unstructured tweets or facebook messages how many people do support the president, without asking it explicitly.
1e9 tweets isn't a large enough corpus to constitute "Big Data" these days. That's less than 1 TB. You can run simple sentiment analysis on a corpus that size on a reasonably-powerful laptop. The sentiment analyzer I wrote for a class project did around 1e3-1e4 sentences/second, and it was largely-unoptimized Java. I shouldn't have much trouble getting it up to ~1e5 tweets/second, which would mean under 3 hours per "billion unstructured tweets". And this is trivial to parallelize, with essentially no dependencies between data points. (My analyzer runs under UIMA, so scaling out across multiple nodes is easy.)
Of course, there are data-analysis problems that are big enough to make naive approaches intractable at much smaller scales. Obviously NP-Complete problems become intractable to brute force quickly. If you're doing pattern analysis on points in 2 or more dimensions, you need to start sampling early on; even 1e6 points in 2D is too many to do, say, various distance metrics on all pairs. And there are analysts dealing with information that's constantly arriving faster than it can be processed, so it has to be sampled at the head of the pipeline in order to do anything useful.
Vincent Granville has a number of blog posts on Analytic Bridge on what "big data" and "data science" really involve. While he certainly has an agenda, he's good at cutting through some of the bull - in an informed way, which is more than I can say for most of the Reg readers who comment on stories like this.
In this context, I think 'Big' really relates to the size of the problem that an organization has. When someone refers to 'Big Data', they mean "we've lost control of our storage requirements" and "we haven't got a clue how to make sense of what we have stored". And "We're in the market for some snake-oil".
Marketing is the least worry...
Governments, phishers, the tax office, security forces and future employers and insurance companies are all a lot scarier potential users of Big Data than mere marketing firms like Google or Facebook.
Re: Marketing is the least worry...
well I accidentally downloaded promo materials from Chiliad (prop: Robert Maxwell's child et al) a few years ago when they were attracting Insurance Companies to become "Information Rich" end ensure an increased "ROI"
but that was years ago and I can't find any links any more, so obviously insurance isn't using PII obtained from BIG DATA
Re: Marketing is the least worry...
>Governments, phishers, the tax office, security forces and future employers and insurance companies are all a lot scarier potential users of Big Data than mere marketing firms like Google or Facebook.
Please, unless you are actively doing serious BS they are just collecting noise. I am willing to wager that it will cost an order of magnitude more than any benefits.
And if you happen to be involved in bullshit, why pray tell are you broadcasting it.
Phishers don't need your data, they will get you to give to them. Most scams are set up to get the gullible and greedy, why work against human nature when you can harness it.
Young folk have no such qualms, understand the transactions they participate in and are more familiar with the privacy controls of the services they use.
Really? Is it not more likely that they have a rather different notion of privacy than us poor old duffers, because of the amount of time an effort Google, Facebook and the rest have spent eroding it?
Anyway, those of you who think that ads and marketing are the worst things that could happen, you're probably not being paranoid enough. Remember that your dear old government can, will, and indeed has demanded all sorts of detailed information from your favourite service providers. How much do your government? I can just about trust mine not to be openly malicious towards me, but their simple incompetence is destructive enough.
It might also be worth considering that the wonderful combination of poor privacy systems and poorer security mechanisms means that personalised and targetted scams are easier than ever to put together. I'd be cautious about stating that you'd never fall for such things.
you're so old!
wow you're so old! you clearly don't know how to use the cispa (and protectip, and sopa) checkboxes on facebook that exempt your personal information from government spying! too bad you're too old to be familiar with those privacy controls! i was using privacy controls in the womb, and will always know how to use them extremely well because facebook and google have never, at great expense, colluded to extract more pii than i intended to give and then sell that pii off to anyone who wanted it! i'm so glad that i am so much younger than you. what is probably best about my youth, oldster, second to the fact that it imbues me with innate technological prowess, is the fact that I will almost certainly never grow any older, and I will remain exactly this superiorly young, and possibly even get younger, and therefor even more adept at using privacy controls!
@ergu Re: you're so old!
You're so young that you don't know that the greatest lie the devil told was that he didn't exist.
The point junior is that they do capture a lot of PII. The only thing that they do is that after a period of time, they anonymize the data so that it doesn't point back to you. However, I'm willing to wager that its still possible to tie that data back to you... (Facebook that is...)
The fact that you trust Google after their War Driving incident? Or that Schmidt admitted that even after attempting to change your name that they could still tie data to you?
Be afraid, be very afraid...
Re: @ergu you're so old! @Ian
Um, Ian? I think ergu was being sarcastic ...
Nice to see you trying to be hip with the LOL bit, though. You're funny :)
Re: @ergu you're so old! @Ian
He probably thinks it stands for "Lots Of Love".
@AC Re: @ergu you're so old! @Ian
I did catch his sarcasm, however the issue is that the 'privacy' controls are all a bit of a sham.
And that only the young who are so trusting fall for it.
As an example... I get tagged in a photo on Facebook. What recourse do I have in getting my name and tag taken down? Especially since I have no way of knowing about the tag unless a friend happens to tell me.
Gets even worse with Google Glasses.
The LOL was that his 'sarcasm' and naivety make me chuckle...
The Devil because you don't believe that he exists. ;-)
"Oldsters are uneasy with the notion that Facebook et al mines their data he said. Young folk have no such qualms, understand the transactions they participate in and are more familiar with the privacy controls of the services they use."
He has it pretty much backwards. Young folks are less concerned because of their lack of experience of how serious a privacy breach can be. They've not yet had their credit cards cloned, or their job prospects damaged by an unwise photo, etc. Anyone who has contact with teen and pre-teen kids knows that they have no idea of what personal info they should share, and what they should keep to themselves. They may know how to operate privacy controls, but don't yet see the need, hence the unconcern. "Out of the mouths of babes... etc."
Oldsters are much more aware of the risks, but react differently. The 70+ group is, in my experience, overly trusting. They come from an era when sharing of personal data just wasn't done, and have difficulty understanding that people can do so in a malicious way, just for profit. My grandmother was most distressed when a phone scam was explained to her, the general disbelief of "why would someone do that?" was hard to get past (fortunately no great harm was done).
It's the mid-life 'oldsters' like myself who are probably the most wary. We know what can be done, and have little trust in Google/Facebook etc. to behave in a decent manner. We are, in my view, rightly uneasy, and as the teenagers grow up they will learn from experience to be wary. They will not, of course, succeed in convincing their own children of that, as always. C'est la vie.
We know what can be done, and have little trust in Google/Facebook etc. to behave in a decent manner. We are, in my view, rightly uneasy, and as the teenagers grow up they will learn from experience to be wary.
I dunno. Ever talked to a Millennial? I don't claim to have a vast experience set, but i can tell you one of the recurring themes I continue to run across in their mindset is that, "If it didn't happen in my (the Millennial's) lifetime, I can't be arsed with it." This results in an extreme myopia, such that explaining to them the concept of, say, protecting their personal information against such things as identity theft is greeted with anything from a pointed "Meh" to the utter astonishment of Phil's gramma
rumour has it
well, there's bound to be a link somewhere in French - but *you* try googling for a privacy violation research topic in French, that might involve google, anyway, here we go.
French Data Protection Authority have mentioned that they suspect(*) that the online fares that you are offered by *** airline and *** railway system and ... are automagically increased between the time that you first visit a merchant webpage - and your subsequent visits to check how the price is doing. it seems that BIG DATA have been increasing the price from a 'known to have looked' IPv4/Cookie/bug/tracker/beacon/ad-networked device-browse compared to the price that you would get from a naked, new, unlinked device-browse when you browse the merchant website at the same time.
(*) it seems they actually did the experiments and took the parallel data last week, so have some proof...but in France & in French. Presumably doesn't happen to nous, les AngloSaxones
Re: rumour has it
I've seen that suggested in both French and English websites/.blogs/papers, there was an article about it in Le Monde a few months ago. Most of the speculation is clearly nonsense, since it talks about tracking IP addresses, as if they were assigned permanently to a device as portrayed in TV crime drama.
I think the link you're referring to is:
although it says that the French authority in question, the CNIL, is going to investigate.
Cookie-based tracking is possible, but I've not been able to duplicate it myself in some fairly unscientific tests over a period of a few months. It seems more likely that it is simple yield management; if an airline website suddenly detects an increase in enquiries for a particular route, the prices will be bumped. It's the sort of thing that could happen if a pop group announces a concert in a particular city, or a sports team qualifies for a place in a competition. Supporters will suddenly start asking "hmm, I wonder how much the tickets are?".
Most French IP addies are fixed..and have been for a good few years now
Posted from France..and have been using the internet here via fixed lines, as a business and at home, since France telecom were charging Fr 1200.oo ( about €950.oo ) per year, to register a dotcom ( minimum period 2 years ) ..way back then, we did indeed have "non permanent" IPs..
But my current provider ( Free ) and all the other majors, have provided "permanent IP addies" for the past 5 years at least..on ADSL..
The article in le monde is not so far fetched, as some may wish one to think..
Re: Most French IP addies are fixed..and have been for a good few years now
Typo ..it is late here ..02.56am Wednesday..
Fr 1200.oo ( about €950.oo ) per year
Fr 1200.oo ( about €180.oo or so ) per year..
Re: Most French IP addies are fixed..and have been for a good few years now
> But my current provider ( Free ) and all the other majors, have provided "permanent IP addies" for the past 5 years at least..on ADSL..
No. I'm also posting from France and Free is the only major provider that does this. To get a fixed IP from Orange you need to have a "Pro" account at a substantially higher price. I would like one, but am not willing to tolerate Free to get it (I was with them for a couple of years, but their non-LLU service was so unreliable that I gave up). Domestic IP addresses are almost universally dynamic, and in some cases get reset very 24 hours.
Of course access from public WiFi points is even less likely to maintain a consistent IP address. It will be interesting to see what the Cnil turn up, but I've been an Easyjet user since the beginning, and in my experience I doubt very much if the conspiracy theorists will be vindicated.
- Product round-up Ten excellent FREE PC apps to brighten your Windows
- Review Tough Banana Pi: a Raspberry Pi for colour-blind diehards
- Product round-up Ten Mac freeware apps for your new Apple baby
- Analysis Pity the poor Windows developer: The tools for desktop development are in disarray
- Chromecast video on UK, Euro TVs hertz so badly it makes us judder – but Google 'won't fix'