back to article There's a tide of unstructured data coming - start swimming

Whether you prefer to define the known size of our planet’s total digital universe in petabytes or even zettabytes, we can all agree the collective weight of data production is spiralling ever upwards. While we focus on the relative merits of transactional versus analytical databases, the unstructured data that fails to fall …

COMMENTS

This topic is closed for new posts.
  1. Anonymous Coward
    Anonymous Coward

    "To face the unstructured data tsunami, best learn to swim"

    I might be wrong, but I gather that the best way to face a tsunami is on high ground, a long way from the coast.

    Or : better to focus your data gathering than to attempt to cope with a flood that is outside of your control

  2. Anonymous Coward
    Anonymous Coward

    But Chris Roche, EMEA chief technology officer at Pivotal, cites projections that 45 per cent of western Europe’s digital universe in 2020 could still be useful if tagged and analysed correctly.

    The CIO of a company whos market is datamining, thinks data about things I bought 5 years ago is of some value... Hahahahahahahahahahaha

    The linking of purchase data, against social media data, is why I don't/won't use social media... I am not a number, and I refuse to allow CIOs to treat me as a number.

  3. Colin Millar

    Navel gazing

    On a truly humungous scale. Still I suppose all those data mining companies trying to create "value" out of rubbish have to make money somehow. They are not creating value - they are creating cost.

    Get your own data sorted out and continue to use human interpretation where valid. Remember - not every piece of information you have access to is useful - don't hoard and don't second guess. Don't waste money on "information stream companies" or anyone who says the phrase "big data".

  4. Tom 7

    To face the unstructured data tsunami, best learn to swim. pt 2

    No - learn to bloody surf. A little effort spent on organising your data - which you will find matches up to organising your company pretty much exactly and you can ride the wave and extract a lot of fun from it.

    Be data driven and dont encrypt and hide your data in documents and you wont have massive stores to search. One piece of well organised data can replace a thousand documents.

  5. Sandpit
    Big Brother

    Not a problem

    This is all a self limiting problem. We have the abllity to generate more data now than is possible any use to anyone, this is because it is cheap and easy, but it still has a cost. When the value of that data diminishes below it's cost to create then the problem starts sorting itself out.

    The problem at the moment is that we are in transition, there are a lot of people spending a lot of money in a futile attempt to organise, understand and preserve a lot of worthless garbage. Once they grow up enough to realise they are wasting their time (and money) they will eventually stop.

    Data of real value will generally be treated well enough. Some will be lost, no big deal, we will always create some more; that data wasn't as valuable as some would like us to believe in the first place.

    1. Doozerboy

      Re: Not a problem

      "Data of real value will generally be treated well enough. Some will be lost, no big deal, we will always create some more; that data wasn't as valuable as some would like us to believe in the first place."

      Great comment

  6. Gen. Apathy
    Stop

    alternatively...

    ...you could delete some of that data because you.

    a: don't need it and it's worthless.

    b: shouldn't keep it anymore because you're not supposed to.

    c: if you keep everything for ever you end up on a channel 4 documentry about obsessive hoarding.

    1. John Smith 19 Gold badge
      Gimp

      Re: alternatively...

      "...you could delete some of that data because you.

      a: don't need it and it's worthless.

      b: shouldn't keep it anymore because you're not supposed to.

      c: if you keep everything for ever you end up on a channel 4 documentry about obsessive hoarding."

      Or d)You run a government surveillance apparatus and you have an irrational desire to collect ever more of it regardless, because you can.

      In which case you are a data fetishist.

  7. Gannon (J.) Dick
    Meh

    Hello Britannia

    Anybody remember anything about ruling the waves ? Like that part about beach rocks and anchoring when the tide is coming in ?

  8. Xxeno

    Ah unstructured data, or is it only unstructured to you. Personally it seems to add security and confuse the right people.

  9. Anonymous Coward
    FAIL

    Descriptive but not prescriptive. Hence useless.

  10. Pascal Monett Silver badge
    Thumb Down

    What a load of bollocks

    As said above, it is hardly surprising that a datamining CEO would trumpet the utter indispensability of his company's tools. Fine, it's a good PR piece.

    But let's look at the reality of things, hmm ?

    "Unstructured data must also be thought of in its textual form of Word documents, emails, social media messages and other as yet undefined data shapes." - sorry, social media has a noise to data ratio that is far too important to make any sort of data mining useless. Yes, you will probably find tweets that say your company is good, and others saying the reverse. You will never be able to map that to a customer having bought something from you unless said customer specifically signs up on your site to tell you that his Twitter account is FancyPants35748 and his credit card name is Jonathan Smith. Sure, some twits will tell you, but most will probably be a bit reluctant to acknowledge that their online persona is GorgeousJunk69.

    Just a few paragraphs later, the article quotes "The fact is that context will always rank as ace high".

    So let's just write off social media now. You'll never get the context in a 160-char tweet.

    Next point : “Those who are still relying on human interpretation will be trying to stay afloat on the unstructured data tsunami with one hand tied behind their back,” dixit Andrew Anderson, CEO of Celaton. What is he saying ? Humans cannot be trusted to manage data in a timely fashion and we must hand over our analysis procedures to computers.

    Yeah, sure. Because we know how to teach computers to distinguish between "programmer" and "Oracle developer" and "business analyst". Yeah, let's hand it over to computers, that'll work a lot better. Just like it works fine in Australia, for processing payments. Tell me, if we can still find major companies capable of botching up a comparatively simple job of paying salaries, how can we expect to be able to get relevant information from a tsunami of unstructured data ?

    “Having a system in place that can understand a candidate’s CV without the need for human intervention is crucial." Indeed. Too bad we don't have a reliable system that can do that automatically without human intervention.

    "correlating point-of-sale transactions with social feeds can provide great insight into how a consumer felt about the company and the product" - yes, except you don't know that that is indeed a consumer of your product, and not some hacker or troll pulling your data leg.

    "This estimates that the digital universe of western Europe will grow from 538 exabytes to 5.0 zettabytes between 2012 and 2020" - yup, and 99% of that will be of absolutely no interest to anyone after a week.

    "We know that a huge amount of unstructured data is spam" - finally something I can agree with. And you want me to waste my time and money building a system that is going to analyse my spam mail to tell me I'm getting spam ? Get lost.

    The reality of data analysis is sort your data first. The bigger the volume of data, the more stringent your data retention criteria must be. The only data worth analysing is the data relevant to your company, the rest is a waste of resources. This article tries to make me believe that I must become the NSA and gather as much data as I can to hoard it and endlessly analyse it. I say bullshit. Recovering every tweet where my company is named is not going to actually give me a proper image of my company. Looking at my sales figures will.

This topic is closed for new posts.

Other stories you might like