CAP theorem* holds no fear for six engineers building FoundationDB, the industry’s latest NoSQL candidate. The difference? It adheres to the principles of ACID** found in relational, which previous NoSQLers have tried to replace. “A lot of people developing NoSQL systems have been discouraged by the CAP theorem and used that …

COMMENTS

House rules Send corrections

This topic is closed for new posts.

Thursday 22nd November 2012 15:18 GMT Anonymous Coward

What are they on?

SQL scales very well. It had problems ten or fifteen years ago but today I don't usually see any problems. Even with 100s of TBs of data and thousands of concurrent users.

I've got to say they seem to be going up a blind path here.

8 0
1. Thursday 22nd November 2012 15:36 GMT JeeBee
  
  Re: What are they on?
  
  And in terms of free relational databases, e.g., MySQL, the clustering and replication issues are terrible, especially when you require ACID across the full set of databases (in a master-master setup). Modern users of databases are beyond using them in a single point of data access mode, they need massive geographical scaling, they need data in many places at once, and often they need that data to be globally consistent.
  
  Of course Facebook don't require global consistency, hence Cassandra is good at ensuring data is eventually written to all cluster members, but doesn't guarantee consistency or offer transactions.
  
  There are different use cases for databases other than very traditional uses.
  
  I do take issue with the guy's assertion that software developers can't handle relational data however.
  
  2 0
  1. Thursday 22nd November 2012 20:04 GMT Niall Wallace 1
    
    Re: What are they on?
    
    >I do take issue with the guy's assertion that software developers can't handle relational data however.
    
    I've generally found that just about every developer I've been introduced to at my place of work has required a whiteboard explanation of Mathematical Sets and how that translates into relational database tables.
    
    To be blunt the database teaching at some unis is utter baws and that includes the one I was at.
    
    We spent years hacking out code in C, C++ and Java but the database work was limited to a couple of databases supporting a front end in access.
    
    The most ridiculous comment in the article is one along the lines of "We aren't DBAs we are software engineers" yeah well guess what, building a database is software engineering, if you can't write SQL then that is your real problem, go and do something that doesn't need a database.
    
    And suggesting that Facebook doesn't need every node uptodate? Every time their clusters are going slow on the updates I can see people complaining about being notified of updates they can't see and that "they can see them on their phone".
    
    From a commercial point of view, if I can see something while in Dundee and another customer can't see it in Munich and the content just happens to be a "fastest finger" competition open to everyone in Europe, that's potentially a legal problem for the competition organizer.
    
    Right, pub time.
    
    5 0
  2. Friday 23rd November 2012 08:22 GMT Matt 21
    
    Re: What are they on?
    
    True for MySQL, not true, in mly experience, for Sybase, Oracle or DB2.
    
    1 0
2. Thursday 22nd November 2012 15:49 GMT Destroy All Monsters
  
  Re: What are they on?
  
  They are on distributed databases.
  
  > SQL scales very well.
  
  SQL is the query language. "Scaling" is a property of the underlying database engine.
  
  6 0
  1. Friday 23rd November 2012 08:22 GMT Matt 21
    
    Re: What are they on?
    
    What a brilliant insight... it should have read RDBMSs scale well.
    
    0 0
3. Thursday 22nd November 2012 16:17 GMT FutureShock999
  
  Re: What are they on?
  
  I think the thing is - SQL is _expensive_ to get to scale with, and it is really, really is inefficient when doing some types of analysis. Anything that you can index well, works well for SQL. But there are applications that want to scan an entire table structures iteratively, and filter or compute on non-indexed attributes - and do so in parallel. In legacy, we would use file-oriented batch for that. In early SQL, we might have used Teradata's unique feature of "temporal parallelism" to achieve some measure of scalability with that. Some companies coded such stuff using Ab Initio, Orchestrate, and similar early parallel processing tools, again using file oriented batch.
  
  SQL has improved over the last 15 years, but there are still non-indexed functions that are just a whole lot more efficient done elsewhere - that is why Map/Reduce and later Hadoop were born. I think the key thing is that they were born for analytic use, where ACID was never really an issue. The problem is that once you have a hammer, everything looks like a nail - especially if that hammer is free, but there is a charge to use a decent screwdriver (i.e., a massively scalable SQL DB).
  
  1 0
4. Thursday 22nd November 2012 16:55 GMT JDX
  
  Re: What are they on?
  
  >>SQL scales very well. It had problems ten or fifteen years ago but today I don't usually see any problems. Even with 100s of TBs of data and thousands of concurrent users.
  
  It scales well on a single server, and a powerful single server can do a hell of a lot. The problem is when you want to split/mirror that DB across multiple servers in different locations.
  
  I still maintain the number of people needing NoSQL are very few - doing something because Google does it is really dumb because even giant DBs are nowhere near what Google is doing.
  
  3 0
Thursday 22nd November 2012 15:40 GMT Gaius

"we are more software engineers than DBAs"

Which begs the question what is a DBA? I'll tell you: it's not the guy who knows SQL. It's not even the guy who knows a particular vendor's product. The DBA is the guy who takes personal responsibility for the integrity and availability of your organization's data, and gets called at 3am if there is a problem - and fixes it before business opens. If you think you don't need a DBA, then your business is exposed to a risk that may be difficult to recover from.

5 0
Thursday 22nd November 2012 15:46 GMT Destroy All Monsters

CAP theorem - not a blocker in the practical world

"CAP theorem clearly poses a theoretical problem for cloud computing, where services are being founded on massively distributed servers for their compute and storage."

In Overcoming CAP with consistent soft-state replication, we read:

"The CAP theorem has been highly influential within the cloud computing community, and is widely cited as a justification for building cloud services with weak consistency or assurance properties. CAP’s impact has been especially important in the first-tier settings on which we focus in this article. Many of today’s developers believe that CAP precludes consistency in first-tier services. For example, eBay has proposed BASE (Basically Available replicated Soft state with Eventual consistency), a development methodology in which services that run in a single datacenter on a reliable network are deliberately engineered to use potentially stale or incorrect data, rejecting synchronization in favor of faster response, but running the risk of inconsistencies. Researchers at Amazon.com have also adopted BASE. They point to the self-repair mechanisms in the Dynamo key-value store as an example of how eventual consistency behaves in practice. Inconsistencies that occur in eBay and Amazon cloud applications can often be masked so that users will not notice them. The same can be said for many of today’s most popular cloud computing uses: how much consistency is really needed by YouTube or to support Web searches? However, as applications with stronger assurance needs migrate to the cloud, even minor inconsistencies could endanger users. For example, there has been considerable interest in creating cloud computing solutions for medical records management or control of the electric power grid. Does CAP represent a barrier to building such applications, or can stronger properties be achieved in the cloud?

(main part of article followed by...)

Conclusion: The CAP theorem centers on concerns that the ACID database model and the standard durable form of PAXOS introduce unavoidable delays. We have suggested that these delays are actually associated with durability, which is not a meaningful goal in the cloud’s first tier, where applications are limited to soft state. Nonetheless, an in-memory form of durability is feasible. Leveraging this, we can offer a spectrum of consistency options, ranging from none to "amnesia freedom" to strong f-durability (an update will not be lost unless more than f failures occur). It is possible to offer ordered-based consistency (state machine replication), and yet achieve high levels of scalable performance and fault tolerance. Although the term amnesia freedom is new, our basic point is made in many comparisons of virtual synchrony with Paxos. A concern is that cloud developers, unaware that scalable consistency is feasible, might weaken consistency in applications that actually need strong guarantees. Obviously, not all applications need the strongest forms of consistency, and perhaps this is the real insight. Today’s cloud systems are inconsistent by design because this design point has been relatively easy to implement, scales easily, and works well for the applications that earn the most revenue in today’s cloud. The kinds of applications that need stronger assurance properties simply have not yet wielded enough market power to shift the balance. The good news, however, is that if cloud vendors ever tackle high-assurance cloud computing, CAP will not represent a fundamental barrier to progress."

Actually the whole issue of IEEE Computer is rather interesting.

3 0
1. Thursday 22nd November 2012 23:42 GMT Ian Michael Gumby
  
  Re: CAP theorem - not a blocker in the practical world
  
  Well... The thing about CAP... in any distributed system there is an issue of time. All nodes being consistent at the same time is a bit difficult since each node is distinct and unique in terms of the clock.
  
  I think its a question of your definitions.
  
  Also I think it would depend on your system clocks and networks for true consistency.
  
  1 0
2. Monday 26th November 2012 16:11 GMT Michael Wojcik
  
  Re: CAP theorem - not a blocker in the practical world
  
  Whether CAP is a problem "in the practical world" depends entirely on the application. The paper you cited is correct in noting that applications have a range of consistency (and durability) requirements, while CAP only applies in an absolute sense to perfect consistency.
  
  If an application has very strong consistency requirements, though, CAP is still going to be an issue in practice. If the cost of any stale read (which might include eg legal exposure) is very high, then you cannot have both availability and high partition resistance. That's trivially proven: if your network is split into two non-empty partitions, P1 and P2, then an update in P1 cannot be seen by P2 until the partitions are reconnected. And that means that nodes in P2 cannot read the data affected by the update in P1 (without incurring the inconsistency cost), which means you lose availability in P2.
  
  Claims of high-availability distributed ACID DBMSes always need to be unpacked, because some of those terms must be qualified. BASE approaches qualify (weaken) consistency and durability, and the paper you cited helps formalize those tradeoffs. It's not clear from the article what FoundationDB is qualifying, and I haven't bothered looking into it, but this "we aren't worried about the CAP Theorem" business is marketing fluff.
  
  1 0
Thursday 22nd November 2012 16:57 GMT Adam Fowler

Or you could just...

...download and use MarkLogic, which has had an ACID Compliant NoSQL database for ages - and one that has an ODBC connector for reporting over data held in relational views. Oh and lots of Enterprise customers.

3 0
Thursday 22nd November 2012 18:04 GMT bbulkow

this isn't about SQL

SQL is about the language, FoundationDB is about scalable transactions. Foundation seems to have some good technology, it will be interesting to see where they get traction as they have not yet been deployed at scale - unlike, say, http://aerospike.com/

The statement about the CAP theorem makes good press, but is misleading. It is absolutely possible to build at-scale transactions, and the CAP theorem doesn't prevent that. What happened was the CAP theorem allowed developers and architects to talk openly about relaxed consistency, which they had already given up by deploying and relying on memcache and sharded MySQL (memcache gives an inconsistent view of the data, shards remove the ability to transactionalize many queries, and MySQL is often run in a non-durable mode).

Once you've given up transactions, you have more choices in which database to use. But Foundation is right, the next step is to build safe transactions into scale-up databases. There is - absoluetly - a new world of databases, and the savvy architect needs to keep up with the times. When millions of transactions per second is an easy step, new businesses and problems can be attacked.

0 0
Thursday 22nd November 2012 19:42 GMT Tom 7

I have a feeling it will all be the same in the end.

It will probably be like a lot of language and architecture design that’s been done of late. It doesn’t matter what you try and do you eventually come up against the bit you were trying to sidestep - not because your solutions any more wrong than anyone else's but because its an inherent part of the problem and no matter how you try and unknot the problem the hard bit always drops out.

2 0
Thursday 22nd November 2012 19:51 GMT Coen Dijkgraaf

key value sore?

Sounds like a open wound to me.

(That typo is supposed to read "key-value store" as per later on in the article).

0 0
Thursday 22nd November 2012 20:25 GMT Steen Hive

Same old same old

Use NoSQL and end up writing an entire relational engine for your application, in the absence of which your NoSQL DB backend is to all intents and purposes stinking great disk full of crud.

2 0
1. Saturday 24th November 2012 13:05 GMT Ian Michael Gumby
  
  Re: Same old same old
  
  Actually no...
  
  I mean, yes you could, and you end up with distributed relational database.
  
  (See Informix's XPS, or EMC's Greenplum in terms of shared nothing MPP)
  
  However, you don't have to create a relational database and you would still have to worry about ACID. Its not as simple as some people would have you believe.
  
  0 0
Thursday 22nd November 2012 23:42 GMT NickLavezzo

Interested? Check us out and apply for alpha

FoundationDB co-founder here. Just wanted to let everyone know that if you want to get into the details of what exactly we've built, and a good bit of the how, check out www.foundationdb.com. Features and Technology are good places to start.

If you want to get your hands on the software / documentation, you can apply for alpha at www.foundationdb.com/#get today and we'll get you access in just hours - even though we're US based and it's Thanksgiving today :)

0 1
1. Friday 23rd November 2012 14:42 GMT Anonymous Coward
  
  Re: Interested? Check us out and apply for alpha
  
  Hi Nick,
  
  The article doesn't really explain how you get around the availability (A in CAP) issue. What happens if the node holding the most current version of a record becomes unavailable and you receive a request for it?
  
  1 0
  1. Thursday 29th November 2012 14:16 GMT Alfonso Garcia-PatiÃ±o Barbolani
    
    Re: Interested? Check us out and apply for alpha
    
    Availability in CAP theorem context does not mean that your DB is available 100%, CAP theorem says that the system cannot converge to 100% in all three dimensions (C, A, and P) at the same time.
    
    Of course, in the real world, and due to physical limitations, there is no way to achieve 100% for all three requirements, but that does not stop people system designers to attempt to get as close as possible to that 100%.
    
    So according to CAP theorem it is perfectly acceptable to return an error if the data is not available (and in practice there is no way to prevent this from happening at some level), but whatever you do to increase availability and thus avoiding return that error will be by doing something that sacrifices the ability to partition, to be consistent or both.
    
    At least that's what I understood.
    
    1 0
Friday 23rd November 2012 06:36 GMT David Harper 1

But is it web scale?

I wish these guys every success, but I'm reminded of the excellent "MongoDB is web scale" cartoon from a couple of years ago:

http://www.xtranormal.com/watch/6995033/mongo-db-is-web-scale

5 0
Monday 26th November 2012 10:50 GMT ashokjoshi

Oracle NoSQL Database is a transactional, scalable key-value store

Oracle NoSQL Database (http://www.oracle.com/technetwork/products/nosqldb/overview/index.html) provides ACID transactions for key-value data.

0 0
Wednesday 28th November 2012 15:41 GMT NuoDB TechE

Why not scale like NoSQL without giving up SQL.

NuoDB which is about to go GA any day now is already 100% SQL, 100% ACID, and 100% elastically scalable. Don't believe me? Download it for yourself at www.nuodb.com.

- w

0 0