The relational database - a mainstay of enterprise computing for 25 years - has been under siege. New approaches to data storage are threatening the RDBMS and precipitating what database guru Mike Stonebraker and others described recently as a "group grope" to find a new database engine. This week, though, could mark the …
Love the Jargon
The only thing missing from this piece was Paris Hilton. But no doubt you will squeeze her in next time. You know, her scalable assets.. In fact I'm counting on you.
Anyways, wise-arses, the database El Reg uses? Can you tell me which it is, punk? Well can ya?
Good Programmer - Bad Programmer
This is the oldest story in programming, Good programmers use RDBMS and bad programmers use whatever sloppy thing they can get away with.
It all depends on your requirements...
...otherwise you're arguing in a vacuum.
Just look at the history of the Web. Starting out with static web pages pulled right out of a filesystem, it's no surprise that webmasters didn't immediately upgrade to transactionally-atomic, normalized relational databases. MySQL started out as msql, a much less ambitious file manager with an SQL query language layer spread over it, and was there at the right time with something that was good enough to power the adolescence of the Web.
As many websites matured into order entry frontends and customer service portals, business requirements started to make themselves heard. MySQL has been adding features to meet them, and the enterprise DB vendors that had the features for years have been trying to raise web developers' expectations and lure them to real industrial strength platforms, the kind of infrastructure that runs banks and stock exchanges.
Do you need ACID properties for a blog? No. Can you currently afford ACID properties for search engine infrastructure? Perhaps not. They would be nice to have, certainly, and will eventually be taken for granted just like rural electrification and indoor plumbing. In the meantime, economics, response time and other requirements will require tradeoffs given the current state of the art.
I repeat: it all depends on your requirements.
Look through a few old journal articles from the 1970's -- shortly after the RDBMS was proposed and before one had actually been developed. People have been saying that the RDBMS doesn't support the way they need to store data since the beginning of time.
Sure, there are better ways of storing data for specialized purposes, like search engines and semantic web. We should embrace technology to solve those needs. But specialized tools tend to be great at one thing and not so great at others. The RDBMS continues to be the best general-purpose data storage paradigm.
Selling books or consulting
I'm e-selling books or T-shirts why should I care about DDL, Schema, Index ... adding columns to accommodate previously unknown/irrelevant attributes ... etc ... etc ...
I'm OO coding, WTF do I need to normalise me objects to store them and de-normalise when retrieving them.
While the wise monkeys are flattening objects to feed them to the RDBMs I'm selling me T-shirts and don't care about massive reports. RDBMs suck.
Educated programmers use RDBMS...
Look at Mr. B's comment, for example. I think the most telling phrase in it is this one:
"I'm OO coding, WTF do I need to normalise me objects to store them and de-normalise when retrieving them" [sic]
This is typical of the mindset that pursues a stripping down, a DUMBING down, of all software related tools so that everyone in the peanut gallery can call themselves an "architect". It's not that this person is too lazy to write a line of SQL. It's much worse than that. It's that he actually doesn't understand why it's good or even useful.
With regards to his "point", I would answer that if he knew how to do OO PROPERLY, he'd have designed a normalized schema based on the actual entities he was working with, and his objects would have been based on the same entities, so he'd have had a very nice match between the two. There would be no NEED for normalization or de-normalization, because he would be thinking in terms of real-world entities and their interactions and designing his system around them.
When properly designed, an RDBMS is an asset to a software project. It provides data normalization, consistency, extensibility, transaction support, and a consistent interface based on the SQL language, which has been very stable for many years.
We're even using them as data stores in desktop software now. Look at Apache's Derby project, which evolved into Sun's JavaDB. It's a real-live full-power relational database that can be embedded in your Java software. And it's free.
Relational databases are still very important and relevant -- to those who know enough to understand why.
Paris, because (look at the glasses!) even SHE uses an RDBMS.
Sorry if my previous post came across a bit "aggro"...
The bottom line, for me, is this:
Do you want to have your referential integrity and scalability supported automatically by a powerful system that just works (an RDBMS)?
Or do you want to have to completely reinvent all those levels of data management so you can use flat files or XML, all so you can evade having to write a little SQL, which isn't even difficult?
In my view all these "RDBMS is teh Sux0r" people are saying that instead of learning to drive a car, they want to hop everywhere on a pogo stick, falling over and bashing their unmentionables on it every time something unexpected happens.
Horses for courses
Having used various databases systems for half a lifetime I feel qulified to comment.
RDBMS databases are a great tool for dealing with well specified structured data and should probably be the default choice for storing any data thats important to you.
However it is also important to realise their failings.
-- They s*ck at storing any hierachical or tree type structures.
Look at all the web pages on how too implement a simple org chart in an RDBMS.
-- They dont cope well complex datastructures.
I once saw an attempt to store SWIFT messages in a relational model. 150+ tables just to store a few basic message types.
-- They are useless for free form text.
The indexing stuctures just dont work for textual searches.
As for the old OO-RDMS impedence match flamewar -- get over it and accept that trying to bolt something as clunky as java classes to something as elegant as SQL is going to be a pain.
Re: Horses for courses
"-- They s*ck at storing any hierachical or tree type structures.""
I don't think that this is an RDBMS problem per se; on a table level you're just storing parent & child columns (amongst other things). The issue is really that the syntax to get the hierarchy back sucks big time. Sql2k5 has had a bit of a sugaring attempt with common table expressions but it's still rubbish.
It's probably too much to ask to get a nice bit of syntactic standardisation: imagine
SELECT * FROM hierarchy TREE JOIN on parent = parent_column, child = child_column
Mind you, once that's sorted, the flipping convert a hierarchical dataset into something that's vaguely useful web-side algorithm needs a bit of work too, so it's not like it's just RDBMS that's the problem...
Web 2.0: All your favourite problems and some new ones as well *sigh*
But org charts were so easy with IMS!
Having said that google's "BigTable" looks promising in this respect. They have some "where parent of" "where child of" type syntax in the sql, plus you can store arbitary objects in a row without having to define them in the schema first.
Pity you can only use it from GoogleApps.
Org Charts in SQL
So I assume you're not talking about creating a visual org chart, because that's the kind of thing that would be easier to mock up in Visio...
What I assume you're talking about is storing an org chart in a database so people can use an application to look up Joe's supervisor, or to dig up all groups with less than 10 members whose manager is at a certain level and which are connected to a specific department...
Operational stuff, right?
Well... Let's say that each employee record has a foreign key to his direct supervisor, a foreign key to his workgroup, and a foreign key to his department. Of course, he'd also have a foreign key to a job code (supervisor, programmer, tech support, clerical weenie, etc). This would make it easier to filter employees when searching.
Let's also say that departments have a foreign key to the department they're contained in, as well as a foreign key to the location they're housed at.
Let's say that workgroups have foreign keys to the departments they belong to.
Now... What if you wanted to find workgroups with less than 10 members who are in the Hoboken, NJ location in the HR department? You'd select workgroups whose location is Hoboken and whose department is HR, where the count of employees that foreign key to that workgroup is less than 10 and greater than zero (if they don't have at least one person, they're defunct, right?).
Then, since you'd be doing this in a Java object, you'd have an ArrayList of Workgroup objects, so you could have each one fetch a list of Employee.objects and their associated Supervisor object. Each object would contain its own query code and SQL. All very self-contained and tidy.
If you wanted to write out a huge report so some graphic designer could visio you up a nice chart, you could query ALL locations, ALL departments, and ALL workgroups, but you'd probably want to do only one at a time, generate text, and clear out the objects, so your PCs memory wouldn't burst into flames in a spectacular technological hari kari.
I guess I'm just not seeing the difficulty. What are you trying to do that is difficult in SQL? What's the hard/impossible part?
Object DB for complex data
I think the power and popularity of RDBMS is due primarily to the standard query language and efficient indexing which makes those queries fast. If your application does not use those heavily and you want to 'model' your data using complex objects, you might want to consider something else, like an Object database.
For example, we have a very complex Contact object that needs to be simply saved by ID and re-read later; I considered pasting the entire serialized object graph here but it is 1300+ lines and goes 5-6 levels deep (for a single object); here are a few of the hundreds of XPath's to give you some idea of the types of data involved:
There are also addresses; co-borrowers; assets; etc etc etc.
Here is all the code needed to save this ENTIRE object to db4o (and the code to read it is similar); NO O/R problems here:
String fileName = "TEST.db4o";
ObjectContainer db = Db4o.openFile( fileName );
db.set(contactObject); // That's it ! Your entire object graph is saved.
'Normalizing' this object would take a LONG time; and a LOT of code; and make it more difficult to make future changes.
There are also commercial object DB's (Objectify, Intersystems Cache) that provide SQL access to your objects; too bad no open-source projects do a decent job of this (yet?).
I still think RDBMS' would be easier to work with.
I dunno... I can't imagine it being too hard to normalize an object structure like that for a relational database. I've got to believe it'd be easier to do it the DB way than the pure object way... Tables and SQL are so much simpler in my opinion.
It wouldn't take THAT much code. Would it? Most of the plumbing would be in an abstract base class, written once and forgotten. All your child classes would have to implement would be abstract methods that set up SQL and parameters, right? Plus you can do wicked voodoo with stored procedures. How would you do that in an ORM setup?
I still don't get the problem non-RDBMS' are supposed to solve. What's hard? What can't be done in SQL?
Question-mark Paris, because she's confused too. :)
NextDB *is* relational
I have one gripe about NextDB being lumped with non-relational solutions, and that is ....NextDB *is* relational. NextDB supports relationships as first-class constructs. If you don't need a relational model, just don't use relationships, but they are there in NextDB if you need them. I recently attended the Web 2.0 conference and there was a discussion group on all these new flavors of database. I was amazed to hear that with SimpleDB there is no way to do a "count"....I was told you have to retrieve all the records and loop over them to do a count. The developer also described how for each record returned to his server, he had to open a connection back to Amazon to retrieve the result. If your server has to open N+1 connections to retrieve N results it smells like the "N+1 Problem" that existed in the earlier versions of Enterprise Java Beans with Entity Beans. But now each connection is actually going over the internet rather than your LAN, which would reduce performance even more. Food for thought.