The companies peddling NoSQL databases, which are increasingly needed for web applications that have a scale that breaks relational databases, are drooling over the prospects of their products going mainstream this year. Back in December, Couchbase, one of the front-runners in the NoSQL race, did a poll asking companies what …

COMMENTS

House rules Send corrections

This topic is closed for new posts.

Monday 13th February 2012 18:21 GMT FIA

My brain hurts

"among the largest enterprises [...]almost 70 per cent have NoSQL projects cooking."

I wonder if that's because of the benefits of NoSQL, or size breeds a tact acceptance that buzzwords must be great?

Or maybe I just don't get why most people need non-relational databases? Is it really that hard to design data structures well?

I get why companies dealing with terrabytes of data that needs to be searchable and indexable use this kind of stuff, but for facebook and google speed and data flexibility is probably more important that boring things like atomic commits, and data consistancy.

However most IT companies, or even large scale enterprises, don't have this level of data requirement surely?

"Free us from Oracle", yes, but that doesn't mean you have to abandon the benefits of SQL.

6 0
1. Monday 13th February 2012 19:44 GMT Simon Hickling
  
  I think it's telling that the article mentions the number of developers, but not the number of people who can design a database. In my experience the 2 are often mutually exclusive.
  
  6 0
2. Tuesday 14th February 2012 08:30 GMT Ian Michael Gumby
  
  @FIA
  
  You ask: "Or maybe I just don't get why most people need non-relational databases? Is it really that hard to design data structures well?"
  
  The answer is maybe.
  
  There are some things that do not fit well in to the RDBMs world.
  
  Try modelling a Off Balance Sheet portfolio of derivatives. It doesn't work well since they are all just contracts where terms can be different based on the type of derivative.
  
  For those who think that RDBMs will go away... not likely. They still do certain things very well.
  
  0 0
Monday 13th February 2012 19:14 GMT JDX

NoSQL databases needed for ... a scale that breaks relational databases

I find this rather unlikely. Traditional DBs have been used for years in some pretty huge datasets. Of course nothing like Google's setup, but really how many systems today suddenly need Petabyte-DBs simply because they're web-apps? Being 'in the cloud' doesn't magically mean you need a new infrastructure, it's only if you get millions of (simultaneous) users and let's face it, you're not going to.

4 0
1. Tuesday 14th February 2012 08:30 GMT Tim99
  
  ...simply because they're web-apps?
  
  Yes, you are right. Most of the web based systems out there, that I see, could manage with SQLite.
  
  http://sqlite.org/whentouse.html
  
  "SQLite usually will work great as the database engine for low to medium traffic websites (which is to say, 99.9% of all websites). The amount of web traffic that SQLite can handle depends, of course, on how heavily the website uses its database. Generally speaking, any site that gets fewer than 100K hits/day should work fine with SQLite. The 100K hits/day figure is a conservative estimate, not a hard upper bound. SQLite has been demonstrated to work with 10 times that amount of traffic."
  
  0 0
Monday 13th February 2012 19:29 GMT BlueGreen

okay, a challenge to nosql-ers

I'm with FIA & JDX, so to the nosqler's, please give me some ideas (3 or 4 would be good) of what kind of data needs a looser schema than classical RDBMs can provide (please note these include blobs). That would be interesting, thanks.

4 0
Monday 13th February 2012 19:52 GMT toadwarrior

somone's afraid of being made redundant

Sounds like all the DBAs are commenting early to hate on NoSQL.

Yes, there is truth to the fact a traditional database work fine in many / most cases but that is no reason not to try something new if you can afford the risk. I'm sure standard DBs will go away. It's not a question of if but when. If more companies try something else then we're more likely to see an improvement in those other things.

1 8
1. Monday 13th February 2012 21:15 GMT Anonymous Coward
  
  I think you're a bit off the mark there.
  
  The benefits of ACID outweigh pretty much anything else, including speed and scalability, in certain applications, and therefore RDBMSes will have a niche for years and years to come. Simply because the "NoSQL" crowd generally have "stick that icky ACID thing in the trash" as the very first thing on their TODO list.
  
  We might see a massive reduction in deployment in the (sizeable) "LAMP" niche, and others where immediate and global consistency are less important than speed, size, and sharding. But I can't really be bothered to care about that, really. The use of SQL there is actually more of a burden than a boon, that with lots and lots of people "experienced" with an RDBMS that's not that good with ACID when you get right down to it (despite occasional fanboiism there) and where the application is poorly suited to the tool. The field is big enough to sport its own branch of sloppy, speedy, highly scalable hipster databases. And hey, maybe the ratio of people who actually know how to use sql to people that do use it will go up a bit. That'd be swell.
  
  Anyhow, this "survey" is probably highly self-selecting, so not likely representative of the larger industry. Rather, it looks like the graphs-and-numbers sauce required to get another self-plug stuck into industry rags, pretending both you and the rag are offering the readers useful information rather than thinly-veiled marketing material.
  
  3 0
2. Monday 13th February 2012 21:26 GMT Ilsa Loving
  
  Not quite
  
  It's not about being made redundant. It's about using the right tool for the job. If you are writing an app that consolidates a bunch of tweets, then that's great. You're not going to care about if you missed a tweet or two, or if running the same query twice doesn't generate the same result each time. You care more about speed than quality.
  
  But this is a very limited problem set. Most of the time, you care far more about accuracy than speed, and in those situations noSQL is flat out crap. Call me old school and irrelevant if you want, but ACID is a good thing. If your accounting figures change from one query to the next, you can bet heads are going to roll. Or, heaven forbid, someone swipes your bank card and manages to withdraw $20000 from your 1000/day limit bank account because they were able to withdraw 20 times before the transaction finally filtered through.
  
  5 0
3. Monday 13th February 2012 21:26 GMT Sean Kennedy
  
  Silly
  
  " I'm sure standard DBs will go away. It's not a question of if but when."
  
  That's...well, really naive. There's a reason standard, or rather, traditional relational databases will be around for a very long time; they work for specific data sets. A lot of datasets work very well in a relational model.
  
  There are areas where nosql works better, no arguments. But claiming it'll replace relationals is just silly and ignorant.
  
  6 0
Monday 13th February 2012 20:02 GMT ZenCoder

This is all new to me.

This is beyond what I was taught or have worked with, but the main point about traditional databases were that they always guaranteed consistency, ACID and all that. Very important for most data storage.

However once you start breaking a database into multiple servers distributed across the globe, the costs of maintaining consistency between those individual database servers rises exponentially, until it becomes impossible to have something like Google or Facebook work in anything close to real time.

So you have distributed storage where changes propagate throughout the system but the database servers are never 100% consistent with each other.

I'd be very unhappy if my bank balance varied according to which ATM machine I was using at the moment, but if some of my friends see my status update immediately and others see it a few minutes latter, or if I get slightly different google results ... its not a big deal.

9 0
1. Tuesday 14th February 2012 11:51 GMT peter_geoghegan
  
  @ZenCoder
  
  PostgreSQL's synchronous replication would allow you to replicate those critical transactions synchronously, so that they'd be available in two places - the master and a nominated standby - before the transaction was committed. This can of course be very expensive, owing to the fact that the transaction cannot commit until it has remote confirmation, and so is very sensitive to latency. However, the PostgreSQL implementation allows the user to selectively use sync rep per transaction, typically only when it makes business sense to pay the innate performance penalty, for financial transactions and so on. All other transactions are usually committed asynchronously (i.e. there may be a slight delay, typically of a few seconds, before they are visible on standbys).
  
  Full disclosure: I work for the company that developed this feature.
  
  0 0
  1. Tuesday 14th February 2012 12:42 GMT stanimir
    
    That's a case w/ replication on a single slave/standby.
    
    If you have more slave servers to execute queries against, the price of sync. transaction rises. Fortunately, not linearly as the slaves process asynchronously but the master node still has to await.
    
    That also includes extra burden to the decisions which transactions to complete async only, when run in container the code (biz logic) is unaware about the underlying transaction model and including otherwise innocently looking operation can break the balls under load.
    
    Lack of consistency is virtually unobserved during light load - i.e. most correctness tests.
    
    Other than that, it's a nice feature but hard to get it right (from developer point of view).
    
    0 0
Monday 13th February 2012 20:37 GMT Ben 50

Interesting usages

One interesting usage I have seen is for real-time collision detection software in a very complex 6-axis CNC machine (with the most crazy physical constraints on which tools can do what, when). The DB (Neo4J) was being used to model physical space and predict just ahead of time (just quick enough to shut the heads down) if a collision was coming - it was the only way to make the thing fast enough given that none of the mathematicians this (famous) manufacturer hired managed to get close to solving the problem formally.

More generally though, these NoSQL databases really come into their own with historical data containing lots of wierd and wonderful inter-relationships (lots of joins in an SQL db are slow... GraphDB's are setup to treat relationships as "first class citizens" from the get go). That's why the web loves them I think, because everybody is slurping up as much data as possible, to track changes, and find insights, for the marketers (make their userbases more monetizable).

1 0
Monday 13th February 2012 21:26 GMT koma

the same managers...

that usually choose a vendor because it offers the best trip to a roadshow in LA or buys Oracle, Citrix, VMWare, SAP, Documentum, Lotus Notes ... they now come up with 'too rigid schemes' argument ???

1 0
Tuesday 14th February 2012 08:30 GMT xyz

Actually...

....although I equate getting a developer in to "do" a database is like gettting a plumber in to do your electrics (EF take note), I actually like the whole noSQL thing and would love to create one for the project I'm currently "architecting." However, there is no ****ing way I'm putting one in an enterprise for a good few years yet, given "they" can't even agree a damn name for the concept....noSQL, coSQL, notOnlySQL, yada yada.

0 0
Tuesday 14th February 2012 09:26 GMT Anonymous Coward

lots of grey area in real life

i am sure that there are switches in a nosql db that effectively enforce acid compliance

equally if you set up autogenerating keys on a table with cols s1...sn in a cluster of databases that are synchronised once every n hours, you get a lot of the flexibility of nosql

if the cost of failure to correctly store data is low - a facebook post - and the recording of that item relative to other unrelated data items is not important - whether your facebook post wwas stored before someone elses or not - or possibly stored twice - nosql may be for you

the other area that nosql does not discuss overmuch is that an sql db can be queried on an ad-hoc basis by a vsst range of other third party apps includin office products (openoffice and ms) and there is a vast pool of people who can do that.

0 1
1. Tuesday 14th February 2012 09:49 GMT stanimir
  
  As the theorem says: out of consistence, availability and partitioning you can pick 2 only.
  
  So you cant have all. NoSQL is about availability and partitioning, so you sacrifice consistency. Hence, there is such a switch exists (ACID NoSQL), NoSQL won't be that hot b/c it won't do partitioning.
  
  NoSQL relies on cheap commodity hardware (aka linux boxes) and scales out.
  
  To get a relational DB working and scaled, you go up not out (less latency) and splash the big money on the big iron.
  
  I am no DBA, I still design the data models, though but resent databases for just being slow and cumbersome [doing the stuff in shared memory is the way to go] but if you need consistency AND durability, something gotta give.
  
  In that aspect, a memory based, lock free and truly scalable (hundreds of cores), hash table with read-committed transactions is a viable data structure. You can asynchronous flush to the disk (s) to pertain durability... so high availability, consistent and noSQL, but no partitioning again. See the first point.
  
  0 0
Tuesday 14th February 2012 09:26 GMT a pressbutton

lots of grey area in real life

i am sure that there are switches in a nosql db that effectively enforce acid compliance

equally if you set up autogenerating keys on a table with cols s1...sn in a cluster of databases that are synchronised once every n hours, you get a lot of the flexibility of nosql

if the cost of failure to correctly store data is low - a facebook post - and the recording of that item relative to other unrelated data items is not important - whether your facebook post wwas stored before someone elses or not - or possibly stored twice - nosql may be for you

the other area that nosql does not discuss overmuch is that an sql db can be queried on an ad-hoc basis by a vsst range of other third party apps includin office products (openoffice and ms) and there is a vast pool of people who can do that.

0 0
Tuesday 14th February 2012 12:46 GMT James Dunmore

Why is it either/or argument

There are times when NoSQL is suitable (think of storing generic metadata on something), and times when RDBMS is good (lots of transactional records) - why not use a hybrid?

As a poster earlier said, if the value of your data isn't high (even though in reality, NoSQL doesn't lose data, and RDBMS is only as reliable as the underlying architecture that is constantly lying to it that it actually did do a disk write anyway) for example a review on a product, use NoSQL, but if the value is high (a payment record) use the other.

I think both are good, and both are needed.

0 0
Saturday 18th February 2012 23:18 GMT lambda_beta

But ....

Where is the mathematical foundation for NoSQL? Oh yes, there isn't any. It's good until you need a REAL database.

0 0