back to article We've heard of data gravity – we're just not sure how to defy it yet

Software engineer Dave McCrory coined the term "data gravity" in a 2010 blog post. Since then he's come up with equations and otherwise thought hard about how to put numbers on it. He even talks about it today in terms of information theory, network evolution, and similarly heady topics. But it stems from a quite simple idea. …

  1. Anonymous Coward
    Anonymous Coward

    "However, based on our knowledge about the machinery or environment in question, we can probably make intelligent decisions about what to save and what to toss."

    Some practical problems with that approach.

    1) you are assuming you know what level of data will be useful in a future case.

    2) smoothed data may remove outlier values that are later found to have been important.

    3) pre-processed data may introduce aliasing effects - you get artefacts that were not actually there.

    4) assumptions about cause and effect are not valid if underlying constraints are breached.

    A system needs to be able to switch on more detailed archiving when the root cause of a recurring situation needs to be identified.

    1. Solarflare

      Ah, it's simple stuff really - just delete everything after you have taken a copy.

  2. Anonymous Coward
    Anonymous Coward

    So this "data gravity" bollocks is just another way of newbies stating the bleeding obvious that anyone with any practical experience already knew?

    Wow, all those millions they earn are so worthwhile!

    1. Doctor Syntax Silver badge

      "newbies stating the bleeding obvious"

      Or the experienced with something to sell telling MBAs the bleeding obvious?

      1. Tom 7 Silver badge

        RE telling MBAs the bleeding obvious?

        When has that EVER worked?

    2. et tu, brute?
      Joke

      Maybe they earn the millions for inventing new buzzwords?

      1. Anonymous Coward
        Anonymous Coward

        It's heavy, man.

    3. TRT Silver badge

      "that anyone with any practical experience already knew?"

      You'd hope, wouldn't you? But there are places where the people in charge, the PHBs etc are a bit hard-of-thinking and insist on and write policies forcing e.g. all data gathered in the expensive London centre to be piped hundreds of miles to the cheap DC located in a marsh with low land values, before being piped back to the London for analysis then piped back to the swamp... repeat ad infinitum.

  3. Doctor Syntax Silver badge

    From the original linked blog post:

    "Data is the most massive and dense, therefore it has the most gravity. Data if large enough can be virtually impossible to move."

    The second sentence sounds more like inertia than gravity.

  4. chrismevans

    The issue here is inertia, not gravity

    Ultimately this problem is one of data mobility, caused by inertia. Moving large amounts of data around is still hard because of the speed of light. We can't instantly make data accessible in multiple places without some kinds of trade-off. Many trade-off's exist - have multiple copies if you don't care about consistency, for example.

    Current thinking (including the idea of data gravity) assumes that data and storage are the same thing. In fact, we can separate the two. Metadata tells us what we think we have. Storage lets us actually access it. If we work on metadata until we need to access the actual data, we can make it appear that our data has no inertia and exists everywhere. This is the startup opportunity described.

    1. TechnicalBen Silver badge

      Re: The issue here is inertia, not gravity

      But meta data is also data... so it's turtles all the way down?

      (Says the person with a computer in a computer in a computer posting on a network of networked networks etc etc)

  5. SVV Silver badge

    Hey listen everybody! I've just come up with some meaningless buzzwordy bullshit!

    "As Khosla put it: "Most of these [data gravity causes] are not critical issues for 'once a year data anti-gravity' threat that CIOs need to hold over cloud vendors' head. "

    I'm sre the CIOs are reading and re-reading this right now, trying to elicit some meaning from this priceless piece of piffle fthrough their new year hangovers.

    To summarise the article for those similarly afflicted. when designing networks, databases and applications, you need to consider things like bandwidth, latency, transactional requirements and scalabilty. Aparently this shiny "new" revelation then needs to be dressed up in glutinuos verbiage, so that CIOs and CEOs don't realise that this is what their network and application architrects are already doing (or, kess charitably, should really be doing in far too many places).

    I've implemented several versions of the "treat essential real time processing differently to the auditing and archiving of data for historical and analytical purposes" use case he seems to have just stumbled across for the first time, as have many other readers I should imagine. I guess mentioning "IoT" in conjunction with it makes it all new, somehow? The only "data anti gravity threat" I can imagine is that somebody uses the phrase at work in aparent seriousness and it finally sends me over the edge.

    1. Doctor Syntax Silver badge

      Re: Hey listen everybody! I've just come up with some meaningless buzzwordy bullshit!

      "As Khosla put it: "Most of these [data gravity causes] are not critical issues for 'once a year data anti-gravity' threat that CIOs need to hold over cloud vendors' head. "

      That one's easy to decode. It's just someone ringing up the vendor every now and again and saying "About your prices...".

    2. Anonymous Coward
      Anonymous Coward

      Re: Hey listen everybody! I've just come up with some meaningless buzzwordy bullshit!

      "To summarise the article, when designing networks, databases and applications, you need to consider things like bandwidth, latency, transactional requirements and scalability. Apparently this shiny "new" revelation then needs to be dressed up in glutinuos verbiage, so that CIOs and CEOs don't realise that this is what their network and application architects are already doing."

      This!

  6. Anonymous Coward
    Anonymous Coward

    Isn't this just a long winded way of saying that you should get your testing right?

    Sensor to cloud to decision output.

    This is where the current thinking with IoT fails, what if the cloud is not available? what if time taken is outside acceptable range? Lets say I have an alarm sensor on my door, if it takes half an hour to set the alarm off for whatever reason then it's useless.

    Having the cloud for storing data is all well and good for future analysis but you shouldn't be relying on it for the actual function of the device.

    1. Doctor Syntax Silver badge

      "Isn't this just a long winded way of saying that you should get your testing right?"

      When it gets to testing you're too late. It's about getting your design right.

      1. Anonymous Coward
        Anonymous Coward

        Good point but this is IoT so you could go further back and ask is it solving a problem or creating one?

        1. Doctor Syntax Silver badge

          "ask is it solving a problem"

          Of course it's solving a problem. It's just the vendor's problem (what can we sell?) and not one of the user.

      2. Martin Gregorie Silver badge

        When it gets to testing you're too late. It's about getting your design right.

        Spot on! If you can't analyse the data flows in a new system in terms of immediacy, latency requirements and volumes at the design stage then you have no business calling yourself a designer or system architect.

        Have an up-vote.

      3. Dagg
        Mushroom

        When it gets to testing you're too late. It's about getting your design right.

        Sorry you can't do that, that is not AGILE that sounds like waterfall. /sarcasm

  7. allthecoolshortnamesweretaken

    You know, pretty much the same could be said when the topic is designing the pipes for a city's sewage system.

  8. John70
    Joke

    Has Big Data gotten so big, it's creating it's own gravity?

    1. TRT Silver badge

      I reckon an Apple fell on his head.

  9. This post has been deleted by its author

  10. Doctor_Wibble
    Terminator

    Perhaps the analogy is appropriate

    Too much data gravity in one place and you create a singularity.

    Are we taking bets on whose system takes over first?

    1. Brian Miller

      Re: Perhaps the analogy is appropriate

      Garden gnomes!

      1. Collect garden gnomes. 2. Magic happens. 3. Profit!

      Data != profit. It just means that there is a bunch of bits in the system. From the original blog post, magically lots of data translates into customers for Sales Force. Up to a point, Lord Copper.

      Yes, we have lots of data. Maybe that data is worth something, and, more likely, it isn't.

  11. LDS Silver badge

    It's not gravity - it's bulimia.

    Companies are collecting data just for the sake of it. They hope one way, one day, to find a gold mine within. They are mostly collecting rubbish, like some hoarders. I would call it DHD, Data Hoarding Disorder.

    They fear gateway, because it just mean to tell the customers data can actually processed and acted upon locally, probably by a Raspberry Pi-like machine. There's really no reason why a thermostat should send data to "the cloud" for processing. It does only because the maker believes it can magically create value from them.

    We're going to flood networks and storage with data with no real value - data that could and should be processed locally to deliver the service they are collected for, without ever leaving the "premises".

    But executives have been brainwashed into believing "data are money" - and they will collect everything just for fear if they don't, someone else will do and find the gold mine. It hasn't anything to do with physics - it's just another symptom about how much psychology matters in business decisions, and in the worst way.

    1. Doctor Syntax Silver badge

      Re: It's not gravity - it's bulimia.

      "It does only because the maker believes it can magically create value from them."

      Nothing magic. Just collecting a ransom subscription from the punters.

      1. LDS Silver badge

        Re: It's not gravity - it's bulimia.

        It's not that simple - you can force customers into a subscription without collecting data (i.e. Adobe Lightroom before they made the "full cloud" version"), you can collect data without a subscription (i.e. Android or Windows 10).

        The subscription model is not new - it was how software was "sold", or better "rented", for most Unix workstations in the '70s-'80s. It does ensure a steady cash flow, instead of attempting to sell upgrades. Bad for customer who use it occasionally, good for the company balance sheet.

        Data hoarding is a different matter, although it does often come along with the subscription model (they may not want to pay for all the data storage). If and how it generates money really is still to demonstrate. Sure, some like Google and FB made them selling the idea of "target advertising" or the like - some others believed it, but if it really works is yet to see. I believe that if it really worked, we would see on web pages a few well placed ads, and not the mess we actually get.

        I'm quite sure they're selling data "invisible to anyone who was unfit for his office, or who was unusually stupid". Of course, spamming billion of ads will generate some data useful to say "See? People actually buy from ads!".

  12. arctic_haze Silver badge
    Meh

    General Relativity Theory for data?

    I wonder how much data density one needs to create an informational black hole.

    1. SVV Silver badge

      Re: General Relativity Theory for data?

      That's easy. An informational black hole can be defined as anything called a "data warehouse". And just as infotmation slowly seeps out of a real black hole in the form of Hawking radiation, information slowly seeps out of a data warehouse in the form of "Quarterly executive summary reports".

      1. Doctor Syntax Silver badge

        Re: General Relativity Theory for data?

        "And just as infotmation slowly seeps out of a real black hole in the form of Hawking radiation"

        And occasionally explodes in the form of a data breach. Oops.

        1. TRT Silver badge

          Re: General Relativity Theory for data?

          You are onto something, I think. Have you ever looked into the wiring cabinet of a data centre? Spaghettification effect, in your face.

  13. The Count
    Facepalm

    The topic or the company

    Whenever somebody says data gravity I can't help but think of the startup that was named Data Gravity. Founded by the original founders of EqualLogic.

    Quit confusing me!

  14. Anonymous Coward
    Anonymous Coward

    Brand new year, same old sh!t

    How many times do we have to rediscover the same basic ideas just because we implemented them in a new(ish) technology iteration?

    1. TechnicalBen Silver badge

      Re: Brand new year, same old sh!t

      Until they run out of [making] buzzwords?

  15. Anonymous Coward
    Anonymous Coward

    Data Collection is about control, not the value of the data

    When all your Internet Of Twattery devices have to send their data to the Cloud (aka 'Other Peoples Computers'), then it's not about latency or usefulness of the data, it's about the vendor of that device having control over you as an end user, locking you in to a subscription service model instead of allowing you to buy a stand-alone device outright.

    1. TechnicalBen Silver badge

      Re: Data Collection is about control, not the value of the data

      Not even subscription. Look at Google, Facebook, Uber, Twitter etc. Just having those "customers" on your books is enough to drive the control/power/influence factor.

      That is where the money is at. How many "users" the service has. So those IOT alarm clocks? I guess FitBits will be an example of how and what will be done with data/customers and which is more lucrative a return, the data on their health, or the ability to hook them into a life long product (as MS/Apple or Coke found out!).

  16. Anonymous Coward
    FAIL

    Kiss My Mass

    Stop it! Stop it! STOP IT!

    You are NOT Physicists or Rocket Scientists or Astrophysicists.

    NO, Data does NOT obey rules just like mass does.

    Who says stupid sh*t like this anymore?

    We GOT to this place by asking the wrong questions and talking like this back in 1993.

    There is NOTHING useful here.

    NOTHING.

    Go Home; STAY Home.

  17. colinb

    Data Ownership a bigger issue

    "However, with machinery such as jet engines collecting hundreds of gigabytes of data per use"

    This data is very useful or a summary of it, but currently the owner of the plane does not own the data, the manufacturers currently say they own it and indeed they collect it.

    Same for the Airframe data.

    The way i see it if you own something you should own the data it produces but its not that simple.

    The legal area here is very grey and will have to be written into contracts but contracts have to be agreed by both parties and manufacturers have no interest in letting go control.

  18. Will Godfrey Silver badge
    Unhappy

    How very inter...

    Ooo look! A butterfly.

  19. Tom Paine Silver badge
    Joke

    How to defy it?

    Easy: tell it to go fuck itself. Cheers, mine's a Special.

  20. Anonymous Coward
    Anonymous Coward

    Dave McCrory a.k.a. the Newton of Data Management

    Perhaps the data has gotten so dense that he's talking out of his black hole.

    1. TRT Silver badge

      Re: Dave McCrory a.k.a. the Newton of Data Management

      He should stick to cat flaps.

  21. Anonymous Coward
    Anonymous Coward

    We've heard of data gravity – we're just not sure how to capitalise on it ...

    Should have gotten the Sham-wow guy to push that data gravity story ???

    Somehow ex- SUN people still have that "know-it-all-better but still failed" stigma ....

  22. VinceH

    Optional

    Although the article was heading in a distinctly different direction, one sentence in particular jumped out at me and I felt needed fixing:

    "And the more data there is, the greater the attractive force pulling applications and services marketeers to associate with find ways to monetise that data."

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019