'MongoDB ate my containers!'

Wednesday 8th June 2016 07:43 GMT werdsmith

So somebody "found a problem" with MongoDB that was behaviour already described in the documentation and therefore we get the staggering conclusion that for some purposes another type of database may be preferred? I need to get a fix on something celestial so I can check the world is still turning.

We don't even know if he was using WiredTiger or MMAPv1 and if the behaviour is the same for both engines.

13 0 Reply
1. Wednesday 8th June 2016 08:09 GMT gv
  
  It's not a bug, it's a feature.
  
  8 0 Reply
2. Wednesday 8th June 2016 08:10 GMT Anonymous Coward
  
  Why not? A design bug (of the container managment application, not Mongo) is still a bug. And it's also a warning "don't use technology X unless you understand how it really works and if it fits your needs, just because it's new and shiny and everybody tells you you should use it or else look like a dinosaur". After all, this is one of the most common and dangerous bugs I see around.
  
  I worked with some blokes who believed it was fun to implement each new piece of software with a newer technology which often they never understood enough, and were let free of doing it (while also spending a lot of time to reinstall to switch to the "distro of the month"), because of course they looked "smart".
  
  When they eventually left (when you can't deliver, it's better to look for another position elsewhere...), they also left behind a pile of bad written unmaintenable code - which I had unluckily to rewrite and consolidate into a few sound, right-for-the-task technologies.
  
  9 1 Reply
  1. Wednesday 8th June 2016 08:15 GMT werdsmith
    
    Why not?
    
    It's not a question of why not, it's more about STFO (stating the obvious).
    
    How is it much different from using a traditional RDBMS and failing to look at the documentation to see how the different isolation levels are supposed to work?
    
    2 0 Reply
    1. Wednesday 8th June 2016 12:39 GMT Anonymous Coward
      
      Re: Why not?
      
      You would be surprised how many developers I found who use databases but don't understand, and never took the time to learn - transactions models and all their subtle (or not so subtle) differences among different engines. Because, most of the time, they are used SQL (or equivalent access methods) works "automagically" (and let's not speak about the "autocommit" crowd, AFAIK that's still the MySQL default...).
      
      I've been called several times to diagnose issues on projects where they were migrating from database X to Y, just to find the same code didn't work as expected. Most of the time it was down to the fact that the databases had different transaction models and implementations, and developers didn't take account of that. It's still buggy code - even if compiles and runs. They are flawed logic bugs.
      
      2 0 Reply
3. Wednesday 8th June 2016 11:52 GMT Anonymous Coward
  
  What's the solution then? Does Mongo support locking?
  
  0 0 Reply
Wednesday 8th June 2016 08:24 GMT Doctor Syntax

Non-ACID databases aren't ACID. There's a surprise.

12 0 Reply
1. Wednesday 8th June 2016 08:58 GMT Zoopy
  
  "Non-ACID databases aren't ACID. There's a surprise."
  
  No, but they *are* Web Scale.
  
  5 1 Reply
  1. Wednesday 8th June 2016 12:07 GMT Anonymous Coward
    
    So is an ACID db with caching.
    
    2 0 Reply
    1. Wednesday 8th June 2016 12:45 GMT Doctor Syntax
      
      "So is an ACID db with caching."
      
      And 100,000 queries a day isn't even web scale.
      
      1 0 Reply
  2. Wednesday 8th June 2016 13:29 GMT Adam 1
    
    For those who missed the joke
    
    https://youtu.be/b2F-DItXtZs
    
    (Language warning)
    
    1 0 Reply
Wednesday 8th June 2016 09:02 GMT Adam 1

mongo didn't eat anything

Now I'm not a fan of the NoSQL fad, but Mongo worked exactly how all NoSQL databases work by design. They trade off transaction isolation for performance. Or put another way, why do you think that these things can be faster than a traditional rdbms? It's defined by the very overheads it can disregard. It is a terrific compromise for certain types of problem but people really need to stop using it for problems requiring ACID.

As for "write your software with the above race condition in mind", that's kind of backwards advice. If you write your own locking or serialisation, I will promise you here and now that it won't be as efficient as the rdbms that you are trying to avoid in the first place.

22 0 Reply
1. Wednesday 8th June 2016 09:15 GMT maffski
  
  Re: mongo didn't eat anything
  
  Yep, or given "When running this query 100,000 times or so a day, we’d hit the bug every few days," added Glasser.
  
  It might be worth structuring your code so you don't care if a given record is not returned on every single iteration.
  
  0 0 Reply
2. Wednesday 8th June 2016 10:08 GMT Anonymous Coward
  
  Re: mongo didn't eat anything
  
  " will promise you here and now that it won't be as efficient as the rdbms that you are trying to avoid in the first place."
  
  Indeed. A lot of people seem to think noSQL DBs are a cheap/free and much faster alternative to tradition relational DBs. They're not. As stated they lack isolation plus they also lack a huge number of features due to the lack of SQL, the main one being complex joins (which ironically MongoDB tries to add back in a half assed way using its hideous mashup of javascript and its built in data access "language"). Mongo itself has a huge list of problems anyone migrating over from a relational DB will encounter - just google.
  
  Mongo should only be used as a read only repository or a key-value scratchpad store - basically 1 step up from a simple file which can also be updated while being read. Anyone who is thinking of using it in a day to day business production enviroment where missing a row of data could mean a lost customer or order (for example) or where complex data relationships exist is just asking for trouble. IMO of course.
  
  3 0 Reply
  1. Thursday 9th June 2016 17:54 GMT Michael Wojcik
    
    Re: mongo didn't eat anything
    
    Mongo should only be used as a read only repository or a key-value scratchpad store
    
    Or if you don't care about consistent and complete results - which is the case for many of the applications where NoSQL is used. If you're running some sort of social-networking site, for example, most end-user queries don't need to be consistent and complete. If some user searches your database of X piles of user-generated content for Y, they probably won't notice a few missing results; and if they think something is missing, they'll just retry the search with a slightly different query.
    
    There are applications which don't need ACID guarantees because the users don't care.
    
    Yes, NoSQL is not a replacement for RDBMSes - they're suited to different problem domains. But that doesn't mean there aren't applications NoSQL is suited for, beyond "read-only" and "scratchpad". (The value of those applications is a different, and more complicated, question.)
    
    0 0 Reply
3. Wednesday 8th June 2016 13:01 GMT Aitor 1
  
  Re: mongo didn't eat anything
  
  Agree.
  
  I had to fix myself some nasty bugs on Oracle 9 HP PA-Risk, basically reimplementing critical regions, etc, and:
  
  1.It took a while to get it right.
  
  2. It was slower than Oracle standard implementation (but that implementation was broken on a specific scenario on multiprocessor PA-Risc UX).
  
  As for ACID.. if you really need ACID, then, GO ACID, if you need it in a few circunstances, you could just code arround it, or having two data repositories.
  
  0 0 Reply
Wednesday 8th June 2016 10:53 GMT Anonymous Coward

noSQL

So, a noSQL database won't necessarily return all the records you ask for. I understand how this can make it faster, but isn't it failing at the whole point of being a database?

I'm trying to think of a situation where getting most of the results rather than all would be desirable, but I can't think of one.

6 0 Reply
1. Wednesday 8th June 2016 11:10 GMT Anonymous Coward
  
  Re: noSQL
  
  "I'm trying to think of a situation where getting most of the results rather than all would be desirable, but I can't think of one."
  
  Statistical analyses will often give good results on 999,999 records instead of 1,000,000.
  
  6 0 Reply
2. Wednesday 8th June 2016 11:59 GMT BigAndos
  
  Re: noSQL
  
  It depends what you're doing. We do a lot of analysis of web tracking data which can be billions of rows. Our SQL server instance may not be able to handle certain queries very quickly or we may not have sufficient disk space to build the right indexes, etc etc.
  
  Something like MongoDB can potentially be much quicker at querying this type of data and is very easy to scale horizontally. As the previous commenter said, if we're looking at billions of rows then even a thousand rows not being returned isn't much of an issue. There is also the fact that web tracking isn't perfect (incognito mode, Ghostery etc) so the dataset will be incomplete anyway.
  
  For any kind of financial / KPI reporting then ACID compliant RDBMS all the way please.
  
  8 0 Reply
3. Wednesday 8th June 2016 19:25 GMT Anonymous Coward
  
  Re: a situation where getting most of the results rather than all would be desirable.
  
  It's not that missing records are desirable, it's that massive scalability and responsiveness can be more important. Does it really matter if you're a bit behind seeing the latest tweets? For a service like Twitter returning something quickly is much more important than returning everything.
  
  The problem, as most posters are pointing out, is that if you're returning a set of active servers you need a DBMS that can get it right everytime, not most of the time....
  
  1 0 Reply
  1. Wednesday 8th June 2016 21:04 GMT hellwig
    
    Re: a situation where getting most of the results rather than all would be desirable.
    
    See, I don't like the Google/Twitter/Facebook way of doing things.
    
    Yes, Google handles millions of requests a day/hour/minute/second? For them, a one in a million error is nothing. However, as an individual user, I feed only a few queries a day to google. I want to know that what Google gives me is correct, not risk that I hit one of those one in a million glitches. If I'm one of Google customers (the advertisers, Hah! did you think I meant users?), I for damn sure want to know that my ad is properly showing up when it's supposed to. But I suppose that lack of consistency is factored into the billing?
    
    It's all about economy of scale I suppose. Google can afford to miss something, and as long as they are accurate enough of the time, no one cares.
    
    0 0 Reply
  2. Wednesday 8th June 2016 21:52 GMT Anonymous Coward
    
    Re: a situation where getting most of the results rather than all would be desirable.
    
    There's an obvious solution to the concurrent updates problem, used by the likes of Postgres: append the new record, THEN overwrite the index so it points to the new version. Anyone querying the index during the update will simply see the old version of the record.
    
    If MongoDB was soundly designed it would do that and a number of other beneficial things, but it would have taken too long to rewrite the C++, and these hipsters might have missed their hype train.
    
    0 1 Reply
4. Thursday 9th June 2016 17:59 GMT Michael Wojcik
  
  Re: noSQL
  
  I'm trying to think of a situation where getting most of the results rather than all would be desirable, but I can't think of one.
  
  I can think of a few (dozen): Google, Facebook, Yelp, Tinder, ... There are many applications where end-users don't need comprehensive or completely correct results. Pretty close is good enough.
  
  Someone else mentions statistical analysis; in general, there are a lot of big-data / OLAP / etc apps where missing the occasional record doesn't hurt.
  
  Before computerization, incomplete results were the norm. Somehow civilization survived.
  
  0 0 Reply
Wednesday 8th June 2016 22:43 GMT Anonymous Coward

MongoDB ate my containers!

" if a document is updated while the query is running, MongoDB may not return it from the query"

As opposed to what, a read operation that extends into the future and reports back any future updates.

"This is why locking is useful .. maybe a SQL database is really what you need, not a loose NoSQL system."

He's executing a search on an indexed field, which dynamically moves entries around depending on the updated value. How do other non-loose NoSQL databases handle this situation.

0 0 Reply
Thursday 9th June 2016 00:43 GMT Borg.King

Moral - increase your future employee's salaries

Then you'll be able to attract and employ staff that are likely to have read, and also likely to have understood, and more likely to implement code that complies to, the instructions.

0 0 Reply
Thursday 9th June 2016 08:25 GMT Anonymous Coward

acId

You keep using that word "locking" - I'm not sure it means what you think it means. Maybe lookup Isolation?

And notice that MVCC, Repeatable Reads or Cursor Stability might not give you what you think it would...

0 0 Reply