back to article Internet-of-stuff startup dumps NoSQL for ... SQL?

In a surprising move, one startup has been forced to migrate its data out of a trendy NoSQL database and into a traditional relational one after running into numerous technical issues with the fluffy new tech. The move by internet-of-things startup Revolv was disclosed by the head of its cloud engineering group, Matt Butcher, …

COMMENTS

This topic is closed for new posts.
  1. Anonymous Coward
    Anonymous Coward

    ha

    And then they will realise that when their database hits more than 10 tables and needs few thousand users then they will need some proper database like SQL Server or Oracle. Redundancy, data warehousing, query optimisation, documentation, security! I do want to implement something in this nosql technologies though, i want to understand their benefits, must be some...

    1. Anonymous Coward
      Anonymous Coward

      Re: ha

      You've never actually used PostgreSQL have you?

      It does _very well_ at every one of the points you mentioned.

      1. Anonymous Coward
        Anonymous Coward

        Re: ha

        Says the people who would have recommended the noSql solution in the first place. There is a difference between very good and optimal. Have you ever done any migrations from one SQL database to another and see what difference it makes?

        1. foxyshadis

          Re: ha

          For only $200thousand-$2million more in license costs, plus doubling whatever they pay currently for DBA expertise, they could have managed an extra 1-2% performance! Absolutely gobsacking amazing.

          A fully tuned MySQL or Postgres is right up there with all the heavyweights in raw performance, until you need advanced site clustering capability. (And SQL Server is just starting to catch up there.) You seriously think a startup gives two shits about that?

          1. Skoorb

            Re: ha

            And if you go for something like EnterpriseDB's version of PostGres, you really do get the large clustering performance and the like. If you don't get Oracle's education or non-profit discounts it can blow them out of the water on overall price.

          2. Anonymous Coward
            Anonymous Coward

            Re: ha

            SQL server is between 5k and 20k. In their case probably 5k will do the job. Of course you can pay more or almost nothing if you don't need the extra memory. But nothing like 200k. Tell me what's more expensive another dba to fine tune your database or just get the correct software from the beginning.

        2. Roo

          Re: ha

          "There is a difference between very good and optimal. Have you ever done any migrations from one SQL database to another and see what difference it makes?"

          Have you ? (I'm asking because you seem to be long on opinion and short on facts so far).

          Of course the guys in the article don't really have much of a problem, because the majority of their business logic will be in their code rather than their database.

          By contrast most SQL->SQL migrations I have undertaken have been quite a lot of work, usually because the developers decided to write their business logic in SQL, and as a consequence have produced a complex, slow, unreliable and unmaintainable morass. Typically this happens because DBAs insist on people doing CRUD operations via stored procs because the DB they are using is shit at parsing SQL on the hoof.

          What I find strange is that the only people I've seen attacking PostgreSQL are salesmen and people who haven't used it. I wonder why that is.

          1. sabroni Silver badge

            Re: because DBAs insist on people doing CRUD operations via stored procs...

            ... because the DB they are using is shit at parsing SQL on the hoof.

            Isn't it because there's no risk of sql injection if you don't "parse SQL on the hoof"?

            1. Anonymous Coward
              Anonymous Coward

              Re: because DBAs insist on people doing CRUD operations via stored procs...

              "Isn't it because there's no risk of sql injection if you don't "parse SQL on the hoof"?"

              It was only an issue for people who concatenated strings with variables taken from input boxes or URL parameters. As long as you still use SQL parameters in the SQL then SQL Injection isn't an issue.

              1. Swarthy

                SQl Injection (was: Re: because DBAs insist on people doing CRUD operations via stored procs...)

                Required reading for anyone "parsing SQL on the hoof": http://bobby-tables.com

            2. Roo

              Re: because DBAs insist on people doing CRUD operations via stored procs...

              "Isn't it because there's no risk of sql injection if you don't "parse SQL on the hoof"?"

              For the record I am of the opinion that you don't pass in SQL from external sources, maybe you take carefully validated field values from forms - you definitely don't pass it through. I was pretty surprised when I found out that web programmers were doing that as standard practice tbh. In those dark days I even got chewed out by the office's self-appointed web-security guru when I expressed that opinion too. ;)

              Anyway, Web Client -> SQL is not among the reasons I had in mind. One of the reasons was that the DB engines used to burn lots of CPU parsing the SQL and then fail at optimising the query. In scenarios where this became a problem sometimes DB access was restricted to a set of carefully optimised SPs.

              The fact is business requirements change, and the apps to support them should evolve alongside them. If you write half your app in SQL you are locked into doing things in an SQL friendly way, in some cases is this a non-optimal if not pessimal way to build an application. The Finance industry is littered with this kind of wreckage today.

            3. Anonymous Coward
              Anonymous Coward

              Re: because DBAs insist on people doing CRUD operations via stored procs...

              Splitting your codebase up into stored procs and another language to avoid sql injection attacks is a bogus argument. Just put the data access code into its own package and use prepared statements and if you really need to compute an exotic query sting run a sanitiser over it. Better that than having to have per developer databases with logic in two places to dev against and "code and procs not both deployed" delays which murder agility and productivity.

              Other bogus arguments used by DBAs to force sprocs:

              # Agility. Translation: you coders have to do proper change management but we dont have to we just fix bugs in sprocs on the formal integration and test envs.

              # Speed. Translation: we can fiddle around with the code in the formal integration and test envs if there are issues later. No we had no idea that you devs know query hints or redesign your data model to avoid issues when you encounter them early - anyway you are not going to encounter anything until we give you the sprocs very late out of sync with the changing requirements by then your only hope is that we debug them on the test server bypassing normal change management.

              # Power. Translation: I cannot read your programming language so I am going to declare it inferior to the sprocs language my database comes with.

              # Its a proven/established solution. Translation: we know more than you do, trust us.

              I suspect that the real reason that MongoDB is so successful is that it doesn't come with DBAs. No disrespect to DBAs I know that any DBA who covers more than three systems is going to be dealing with at least one bunch of renegade PHP programmer who have yet to discover change management or source control; and that brings out the fascist in even the most accommodating people.

        3. Daniel B.
          Boffin

          Re: ha

          Says the people who would have recommended the noSql solution in the first place.

          I use a lot of open source stuff. Yet I would never recommend NoSQL for the same reasons these dudes switched to PostgreSQL: it's got issues. Never mind that NoSQL's name itself shows the real motive behind most of those "newfangled" DBs: they're built and promoted by crybabies that hate SQL so much they made their own DBs that don't do SQL or ACID. The same kind of crybaby attitude made me switch back from MySQL to PostgreSQL, as MySQL's documentation couldn't stop whining that transactions and foreign keys were for losers or lazy developers, we won't implement them ,yadda yadda yadda. (Ironically, they had already added the multi-engine support and InnoDB did support all those things. Yet the documentation still had this baby rant.)

          NoSQL stuff has its place. But devs should really see if they need it or if they just have relational data that doesn't need those other things. It'll pay in the long run. :)

  2. Howard Long

    Schoolboy errors

    Let me guess what the schoolboy errors might've been...

    a) A basic misunderstanding of mapping the product requirement to the technology, prolly made by some bright young thing wanting to use the latest technology for bulking out his CV.

    b) Dev guys using pathetically small unrepresentative datasets for their unit testing.

    c) Common mis-conception by PMs these days that non-functional requirements such as performance are not as important as functional requirements.

    1. Anonymous Coward
      Anonymous Coward

      Re: Schoolboy errors

      Howard, have you got the T-shirt mate?

      Otherwise, I'd be very weary of talking about "schoolboy errors", while knowing fuck all about the problem at hand.

      1. Howard Long

        Re: Schoolboy errors

        Other than reading the article, the blog, and having 38 years behind keyboards, including teleprinters, paper tape, VDUs, mainframes, minis, PCs, Macs, you name it, and the last 27 years programming and admining enterprise databases, you're right, I know fuck all about the problem and I should be weary [sic] of talking about "schoolboy errors".

        Sorry bright young thing, I've been there, seen it, and maybe 30 odd years ago I did it too. But I'm better now, as I am sure you will be in the fullness of time, and be fortunate and liberated enough to publish their opinions on a public forum with their real name and not AC or a nom de plume. Good luck with your career, but be careful out there.

        1. Androgynous Cupboard Silver badge

          He he

          "one to one relationships and one to many relationships and many to many relationships"

          Gosh, that does sound complicated. I simply can't imagine how you could represent that in SQL.

        2. Anonymous Coward
          Anonymous Coward

          Re: Schoolboy errors

          > Other than reading the article, the blog, and having 38 years behind keyboards,

          Ah, yes, one of those know-it-all types that for some reason make me so nervous.

          So how much of that experience behind keyboards has been taking responsibility for architectural decisions in a start-up environment? And you got every single one of them right the first time? If so, I'm honestly impressed.

          Having got my share of decisions wrong (not just in computing, I used to fly for a living), and learned from it, I rather tend to think "there but for the grace of God..."

          But hey, if feeling blissfully adequate is what rocks your boat, be my guest.

  3. Stefan 6

    Surely serving a 1,000 customers can be done even with a simple microsoft access database file ;-)

    At the stage they are it just doesn't matter yet what they use.

    As long as you use a proper persistence/data access layer with business objects it is easy to swap the storage technology behind it. As your business grows you can then start caching objects and do hybrid approaches (mixing NoSQL/SQL) to increase performance as needed.

    1. Howard Long

      "As long as you use a proper persistence/data access layer with business objects it is easy to swap the storage technology behind it. As your business grows you can then start caching objects and do hybrid approaches (mixing NoSQL/SQL) to increase performance as needed."

      Please tell me you're not serious? You'll be using that over-used word "trivial" every sentence next.

    2. Tim99 Silver badge
      Coat

      @Stephan 6

      Surely serving a 1,000 customers can be done even with a simple microsoft access database file ;-)

      After having written production MS Access based stuff, it always surprised me how easily it could be migrated to SQL Server, so 1000 customers is a trivial "proof of concept".

      But if you want fast, scalable fancy web-based stuff, how about SQLite with CGI/FastCGI? ;-) That will migrate easily to Postgres IF you ever get more customers...

    3. Alan Brown Silver badge

      "As long as you use a proper persistence/data access layer with business objects it is easy to swap the storage technology behind it. As your business grows you can then start caching objects and do hybrid approaches (mixing NoSQL/SQL) to increase performance as needed."

      If it was that easy, everyone would do it.

      I've done a few database migrations. They're never trivial (and mysql doesn't scale well past a few tens of millions of entries - or rather it scales linearly, whilst postgres starts out larger but doesn't grow anywhere near as big as mysql does when you have 500 million entries in it.)

      As for using Access, well that's better than using Excel (which I know several large hospitals were using in the 90s and it probably explains their shitty book keeping)

  4. Ian Michael Gumby

    Meh.

    At the scale they are talking about Mongo shouldn't have had a problem.

    Their problem? Using a tech because its the latest hot buzzword without understanding the basics of the technology.

    There's nothing wrong with a relational engine. But you need to make sure your problem fits in to the relational model.

    In NoSQL, you need to think more in terms of a hierarchical model and not many people understand this.

  5. Lapun Mankimasta

    And they discover that NoSQL - essentially the same tech as either hierarchical or network database management of the 70s - hasn't got the same ease of management as a relational database ... ? surprise, surprise!!!

  6. Tom 7

    Its like that first hormone rush of youth

    we dont need no steeenking relationships.

    Oh god pass me another box of tissues!

  7. sysconfig

    Article made me chuckle

    NoSQL is probably one of the biggest hypes of the last few years and certainly makes sense for many applications. But it's not a one-size-fits-it-all for everything, contrary to how it's sometimes being advertised.

    Great to see a company stepping up and saying: "we've tried it, but didn't work. SQL is not so bad after all, depending on what you ACTUALLY need"

    Use what makes sense for your application, not what everybody else is raving about!

  8. clocKwize

    Is this news?

    NoSQL solves a different set of problems. They have relational data, they even described it in a relational way (one to one, one to many, many to many..).

    The problem is 1 of 2 things:

    1) Its relational data and fits best in a relational system, maybe they should have thought about this first and picked the right tool for the job.

    2) After describing it as relational, maybe they just implemented it in a relational way? NoSQL doesn't work if you try and do it in a relational way. I made this mistake once.

    Slow news day though?

    1. Anonymous Coward
      Anonymous Coward

      Re: Is this news?

      "NoSQL doesn't work if you try and do it in a relational way"

      It does - you've just got to change your thinking. Instead of mapping your data model to the abstract, normalised relationships, you map your data model to the use cases of those relationships.

      All of them.

      It's not unusual to find a K,V store with the same bits of information stored in various forms hundreds, if not thousands of times. That makes SQL users cringe, but that's good use of NoSQL.

      The drawback is what they describe in the article - instead of information retrieval being intuitive SQL, it becomes fully fledged programmatic access. What I don't buy into is their description of it being "50 lines of code". That sounds to me like they tried to map their relational data model to the non-relational data store, and are having to issue multiple gets and piece information back together at the application level.

      Of course that's unsustainable.

      The way to sustain it is to engineer your dataflow correctly. Make it event-oriented and materialise the state through MapReduce/Flume/Spark/whatever pipelines that slice and dice the atomic events into many K,V representations covering all use cases.

      But, as they said, they're not willing to do this. They'd rather accept the technical debt of falling back to the comfort of SQL than to actually fix their problems. If they end up growing in the world of IoT, they'll regret it. The Hadoop/NoSQL ecosystem is built for that. SQL is not.

      1. Destroy All Monsters Silver badge

        Re: Is this news?

        accept the technical debt of falling back to the comfort of SQL

        IT marketing has jumped the shark.

      2. clocKwize

        Re: Is this news?

        "NoSQL doesn't work if you try and do it in a relational way"

        "It does - you've just got to change your thinking. Instead of mapping your data model to the abstract, normalised relationships, you map your data model to the use cases of those relationships."

        But then thats not a relational way... you are expanding on exactly what I was talking about :)

      3. Philip Lewis

        Re: Is this news?

        "use cases of those relationships."

        Every time I see "use cases" as a design philosophy, I barff a bit, then restrain myself from dismembering the idiot who uttered it.

  9. SPimpernel

    From his description of the problem he should be using an object database or graph database. He'll eventually run into problems and will either need to convert to the right technology or go broke.

  10. Destroy All Monsters Silver badge
    Headmaster

    it does have a complicated set of data relationships – "one to one relationships and one to many relationships and many to many relationships," explained Butcher

    In the 21st century, texts with more than 140 characters are tl;dr and N-to_n relationships are "complicated".

    Its database is needed to reflect these many relationships [The key is in the name, chaps.—Ed.

    No, Ed, the "relational" does not come from these "relationships", it comes from the fact that one expresses the relationship between "attributes" using a "relation" (aka a "table") -- as opposed to a set of pointers as was the custom up to Codd.

  11. Wall-meet-Head
    Thumb Up

    Probably the right order to do it in...

    This might look like one in the eye for NoSQL DBs, but if you think about their data model challenges as they were starting out, they would have been frequently changing their schema as their product evolved, without a whole lot of real-world data to validate it against. This would have favoured a NoSQL approach over a more structured SQL one, since it's a lot more tolerant to schema changes on the fly.

    Now that things have settled down with their schema, SQL is the way to go, because they've woken up to the fact that it's one heck of a lot less complex to get data OUT of a relationally structured store than from a hierarchy of document-like shreds. They just should have considered it a bit earlier than they did: would have saved them lots of pain!

    1. Anonymous Coward
      FAIL

      Re: Probably the right order to do it in...

      Using cron would be a better approach. Oh you'd want wrap it with perhaps a nicer interace, but from the problem domain, much beyond that is seriously gilding the lily.

  12. JohnnyTheSailor

    A bad workman always blames his tools

    If your application can't scale to just 1000 users then you've probably designed it badly in the first place.

    Just sayin'.

  13. Philip Lewis

    The usual suspects

    Probably 1 (maybe 2) serious dbas making comments here and a load of the usual mindless crap (and down votes) by people obsessed with FOSS (or whatever Free version they like), zero understanding of data or database theory and/or an irrational hatred of Oracle. Assign a liberal dash of zero clue about anything that remotely resembles "the enterprise", (you know that serious place where serious dbas' salary (for really good reasons) has a a number rather greater than 1 before the 5 zeroes) and you have the average commentard here on The Reg. when matters of data rear their head.

    Welcome to The Register .... NOT the home of the data custodians.

    Bootnote: Another hard Monday at the crèche

  14. Anonymous Coward
    Anonymous Coward

    They should consider graph databases

    If they have many-to-many relationships, that should be a orange flag for going the SQL route. That means they will need intersection tables, which are abstract "entities" that don't correspond to "real" entities. Not saying you should never do this, but it's often not ideal for SQL and can lead to some convoluted and inefficient queries and messy updates.

    There is a third alternative, less well known than either the traditional SQL or newer k,v or BigTable style of NoSQL, namely graph databases. (Some people include graphs as a flavor of NoSQL, but it's also the case that what many people know as NoSQL excludes graphs). If the core of their business problem is the relationships between objects and tracing those relationships, a graph database may be exactly what they need -- and the presence of lots of M-M relationships is often a sign that is the case.

    1. Anonymous Coward
      Anonymous Coward

      Re: They should consider graph databases

      The best modern graph databases are BigTable under the hood; usually HBase. The flexible schema and key-value lookup features make it ideal for graph traversal.

      However graph databases aren't necessarily great for modelling this kind of thing. Superb for modelling and retrieving complex relationships, but extremely poor for doing batch analytics - it's difficult to get aggregate results out of a data structure that is fundamentally based around nodes and edges. You end up having to just go all out and flatten the graph structure into a wide tabular one and then you've come full circle again. I'd also suggest that the relationships here aren't complex enough for graph analysis/materialisation; they're complex in the virtue of being many-many, yes, but they're shallow, unlikely to be of any greater depth than one. That turns your graph database into a very slow table.

  15. Adam Fowler

    Lack of knowledge of the data all round?

    *bias alert* I work for MarkLogic, a NoSQL database vendor.

    Many people start with NoSQL because of the schema less nature. Although natural, more important is the type of data and queries you're working with. Where the data fields are known, and relationships are known in advance, RDBMS may be a better fit.

    Equally, relational thinking being applied to NoSQL databases is a real problem. This mainly stems to using NoSQL requiring a different way of thinking. Denormalisation vs. Normalisation. Searches over potentially non-existing indexed fields (for document DBs anyway) being more akin to search engine tech vs. query over static, known fields.

    I've seen many cases of the opposite of this story. Many people using an excellent RDBMS like Oracle (because it IS excellent) when there is a problem of huge data variety that cannot possibly be known up front, making RDBMS schema design challenging to keep up to date.

    It's simply a case of the right tool for the right job.

    Other points in this article are more interesting to me. The lack of good documentation and DBA tools is a real problem for NoSQL adoption. One only a few companies *coff* like MarkLogic *coff* have addressed.

    Those who point out the religious war on the comments are right too. This type of decision should be about the data, and the problem being looked at. There's enough data problems in the world to warrant both RDBMS and NoSQL databases living side by side. Just like we have mainframes still with RDBMS today.

    NoSQL will get there, but not all NoSQL databases are there yet.

  16. Breen Whitman

    Here is a joke: what is the difference between an iphone owning hipster and a noSQL database?

    Answer: both can do trivial tasks but fail at harder tasks such as real world corporate activity. But at least nosql isn't a floucing poof.

  17. Anonymous Coward
    Anonymous Coward

    Old dogs and new tricks

    The "either/or" between document db NoSQL and RDBMS is yesteryear's debate. Postgres just added JSONB which is binary json with indexes which competes with MongoDB's bson.

    Rumour has it some big firms have benchmarked Postgres JBSON it against DB2 doing indexes over json and have found it to be faster. I would be shocked if Oracle and SQLServer don't follow this trend as it seems obvious that a mature and flexible RDBMS system can figure out how to so schema-less document storage with indexes as an incremental and complimentary add-on.

    Conversely a new player like mongodb is not going to easily add relational capabilities or transactions easily in an incremental fashion. If your holding shares in 10gen you should sell them; they are going to be squeezed out by the established players doing "me too bson" joining to relational reference data else by serious low latency NoSQL players like Aerospike.

    Most folks out there would be well served by picking a 20+ year mature database product which has learnt new tricks; only use such a niche beast where they can prove such exotics are the correct long term solution.

This topic is closed for new posts.

Other stories you might like