back to article 'MongoDB ate my containers!'

Welcome back to The Register's weekly software bug parade, Line Break: Season Two. After a hiatus, and a vacation here or there, Line Break has been recommissioned. You can catch up on previous episodes, here. The idea is simple: if you spot buggy code in the wild that's driven you bonkers or to hysterics, drop us a line with …

  1. werdsmith Silver badge

    So somebody "found a problem" with MongoDB that was behaviour already described in the documentation and therefore we get the staggering conclusion that for some purposes another type of database may be preferred? I need to get a fix on something celestial so I can check the world is still turning.

    We don't even know if he was using WiredTiger or MMAPv1 and if the behaviour is the same for both engines.

    1. gv

      It's not a bug, it's a feature.

    2. Anonymous Coward
      Anonymous Coward

      Why not? A design bug (of the container managment application, not Mongo) is still a bug. And it's also a warning "don't use technology X unless you understand how it really works and if it fits your needs, just because it's new and shiny and everybody tells you you should use it or else look like a dinosaur". After all, this is one of the most common and dangerous bugs I see around.

      I worked with some blokes who believed it was fun to implement each new piece of software with a newer technology which often they never understood enough, and were let free of doing it (while also spending a lot of time to reinstall to switch to the "distro of the month"), because of course they looked "smart".

      When they eventually left (when you can't deliver, it's better to look for another position elsewhere...), they also left behind a pile of bad written unmaintenable code - which I had unluckily to rewrite and consolidate into a few sound, right-for-the-task technologies.

      1. werdsmith Silver badge

        Why not?

        It's not a question of why not, it's more about STFO (stating the obvious).

        How is it much different from using a traditional RDBMS and failing to look at the documentation to see how the different isolation levels are supposed to work?

        1. Anonymous Coward
          Anonymous Coward

          Re: Why not?

          You would be surprised how many developers I found who use databases but don't understand, and never took the time to learn - transactions models and all their subtle (or not so subtle) differences among different engines. Because, most of the time, they are used SQL (or equivalent access methods) works "automagically" (and let's not speak about the "autocommit" crowd, AFAIK that's still the MySQL default...).

          I've been called several times to diagnose issues on projects where they were migrating from database X to Y, just to find the same code didn't work as expected. Most of the time it was down to the fact that the databases had different transaction models and implementations, and developers didn't take account of that. It's still buggy code - even if compiles and runs. They are flawed logic bugs.

    3. Anonymous Coward
      Anonymous Coward

      What's the solution then? Does Mongo support locking?

  2. Doctor Syntax Silver badge

    Non-ACID databases aren't ACID. There's a surprise.

    1. Zoopy

      "Non-ACID databases aren't ACID. There's a surprise."

      No, but they *are* Web Scale.

      1. Anonymous Coward
        Anonymous Coward

        So is an ACID db with caching.

        1. Doctor Syntax Silver badge

          "So is an ACID db with caching."

          And 100,000 queries a day isn't even web scale.

      2. Adam 1

        For those who missed the joke

        https://youtu.be/b2F-DItXtZs

        (Language warning)

  3. Adam 1

    mongo didn't eat anything

    Now I'm not a fan of the NoSQL fad, but Mongo worked exactly how all NoSQL databases work by design. They trade off transaction isolation for performance. Or put another way, why do you think that these things can be faster than a traditional rdbms? It's defined by the very overheads it can disregard. It is a terrific compromise for certain types of problem but people really need to stop using it for problems requiring ACID.

    As for "write your software with the above race condition in mind", that's kind of backwards advice. If you write your own locking or serialisation, I will promise you here and now that it won't be as efficient as the rdbms that you are trying to avoid in the first place.

    1. maffski

      Re: mongo didn't eat anything

      Yep, or given "When running this query 100,000 times or so a day, we’d hit the bug every few days," added Glasser.

      It might be worth structuring your code so you don't care if a given record is not returned on every single iteration.

    2. Anonymous Coward
      Anonymous Coward

      Re: mongo didn't eat anything

      " will promise you here and now that it won't be as efficient as the rdbms that you are trying to avoid in the first place."

      Indeed. A lot of people seem to think noSQL DBs are a cheap/free and much faster alternative to tradition relational DBs. They're not. As stated they lack isolation plus they also lack a huge number of features due to the lack of SQL, the main one being complex joins (which ironically MongoDB tries to add back in a half assed way using its hideous mashup of javascript and its built in data access "language"). Mongo itself has a huge list of problems anyone migrating over from a relational DB will encounter - just google.

      Mongo should only be used as a read only repository or a key-value scratchpad store - basically 1 step up from a simple file which can also be updated while being read. Anyone who is thinking of using it in a day to day business production enviroment where missing a row of data could mean a lost customer or order (for example) or where complex data relationships exist is just asking for trouble. IMO of course.

      1. Michael Wojcik Silver badge

        Re: mongo didn't eat anything

        Mongo should only be used as a read only repository or a key-value scratchpad store

        Or if you don't care about consistent and complete results - which is the case for many of the applications where NoSQL is used. If you're running some sort of social-networking site, for example, most end-user queries don't need to be consistent and complete. If some user searches your database of X piles of user-generated content for Y, they probably won't notice a few missing results; and if they think something is missing, they'll just retry the search with a slightly different query.

        There are applications which don't need ACID guarantees because the users don't care.

        Yes, NoSQL is not a replacement for RDBMSes - they're suited to different problem domains. But that doesn't mean there aren't applications NoSQL is suited for, beyond "read-only" and "scratchpad". (The value of those applications is a different, and more complicated, question.)

    3. Aitor 1

      Re: mongo didn't eat anything

      Agree.

      I had to fix myself some nasty bugs on Oracle 9 HP PA-Risk, basically reimplementing critical regions, etc, and:

      1.It took a while to get it right.

      2. It was slower than Oracle standard implementation (but that implementation was broken on a specific scenario on multiprocessor PA-Risc UX).

      As for ACID.. if you really need ACID, then, GO ACID, if you need it in a few circunstances, you could just code arround it, or having two data repositories.

  4. Anonymous Coward
    Anonymous Coward

    noSQL

    So, a noSQL database won't necessarily return all the records you ask for. I understand how this can make it faster, but isn't it failing at the whole point of being a database?

    I'm trying to think of a situation where getting most of the results rather than all would be desirable, but I can't think of one.

    1. Anonymous Coward
      Anonymous Coward

      Re: noSQL

      "I'm trying to think of a situation where getting most of the results rather than all would be desirable, but I can't think of one."

      Statistical analyses will often give good results on 999,999 records instead of 1,000,000.

    2. BigAndos

      Re: noSQL

      It depends what you're doing. We do a lot of analysis of web tracking data which can be billions of rows. Our SQL server instance may not be able to handle certain queries very quickly or we may not have sufficient disk space to build the right indexes, etc etc.

      Something like MongoDB can potentially be much quicker at querying this type of data and is very easy to scale horizontally. As the previous commenter said, if we're looking at billions of rows then even a thousand rows not being returned isn't much of an issue. There is also the fact that web tracking isn't perfect (incognito mode, Ghostery etc) so the dataset will be incomplete anyway.

      For any kind of financial / KPI reporting then ACID compliant RDBMS all the way please.

    3. Anonymous Coward
      Anonymous Coward

      Re: a situation where getting most of the results rather than all would be desirable.

      It's not that missing records are desirable, it's that massive scalability and responsiveness can be more important. Does it really matter if you're a bit behind seeing the latest tweets? For a service like Twitter returning something quickly is much more important than returning everything.

      The problem, as most posters are pointing out, is that if you're returning a set of active servers you need a DBMS that can get it right everytime, not most of the time....

      1. hellwig

        Re: a situation where getting most of the results rather than all would be desirable.

        See, I don't like the Google/Twitter/Facebook way of doing things.

        Yes, Google handles millions of requests a day/hour/minute/second? For them, a one in a million error is nothing. However, as an individual user, I feed only a few queries a day to google. I want to know that what Google gives me is correct, not risk that I hit one of those one in a million glitches. If I'm one of Google customers (the advertisers, Hah! did you think I meant users?), I for damn sure want to know that my ad is properly showing up when it's supposed to. But I suppose that lack of consistency is factored into the billing?

        It's all about economy of scale I suppose. Google can afford to miss something, and as long as they are accurate enough of the time, no one cares.

      2. Anonymous Coward
        Anonymous Coward

        Re: a situation where getting most of the results rather than all would be desirable.

        There's an obvious solution to the concurrent updates problem, used by the likes of Postgres: append the new record, THEN overwrite the index so it points to the new version. Anyone querying the index during the update will simply see the old version of the record.

        If MongoDB was soundly designed it would do that and a number of other beneficial things, but it would have taken too long to rewrite the C++, and these hipsters might have missed their hype train.

    4. Michael Wojcik Silver badge

      Re: noSQL

      I'm trying to think of a situation where getting most of the results rather than all would be desirable, but I can't think of one.

      I can think of a few (dozen): Google, Facebook, Yelp, Tinder, ... There are many applications where end-users don't need comprehensive or completely correct results. Pretty close is good enough.

      Someone else mentions statistical analysis; in general, there are a lot of big-data / OLAP / etc apps where missing the occasional record doesn't hurt.

      Before computerization, incomplete results were the norm. Somehow civilization survived.

  5. Anonymous Coward
    Facepalm

    MongoDB ate my containers!

    " if a document is updated while the query is running, MongoDB may not return it from the query"

    As opposed to what, a read operation that extends into the future and reports back any future updates.

    "This is why locking is useful .. maybe a SQL database is really what you need, not a loose NoSQL system."

    He's executing a search on an indexed field, which dynamically moves entries around depending on the updated value. How do other non-loose NoSQL databases handle this situation.

  6. Borg.King

    Moral - increase your future employee's salaries

    Then you'll be able to attract and employ staff that are likely to have read, and also likely to have understood, and more likely to implement code that complies to, the instructions.

  7. Anonymous Coward
    Anonymous Coward

    acId

    You keep using that word "locking" - I'm not sure it means what you think it means. Maybe lookup Isolation?

    And notice that MVCC, Repeatable Reads or Cursor Stability might not give you what you think it would...

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon