Are XML data types atomic?
Data held in a relational database is supposed to be atomic. Codd’s rule 2, the guaranteed access rule says:
"Each and every datum (atomic value) in a relational data base is guaranteed to be logically accessible by resorting to a combination of table name, primary key value and column name."
I take atomic to mean, in general usage, that the data has no internal structure. The value ‘14’, it can be argued, has no internal structure, so it is atomic.
But it turns out to be rather difficult to define atomic.
In his recent book “Database in Depth - relational theory for practitioners” (ISBN 0-59610012-4, O’Reilly, 2005; Chris Date argues that the data types that we know and love, (integers, text, dates) are not really atomic at all. For a start, strings can be decomposed into characters. Dates, even more obviously, have an internal structure – day, month, year. Another argument says that, if these data types really are atomic, then why do we have so many functions that manipulate them and their internal structure – DateDiff, Date, Day, Hour, Chr, LCase, Left, InStr etc.?
Chris goes on to say that “The real point I’m getting at here is that the notion of atomicity has no absolute meaning; it depends on what we want to do with the data.”
So, is XML atomic data or not? My own personal belief (for what its worth) is that XML is non-atomic. It can have an arbitrarily complex internal structure and, for me, by the definitions I have used over the past 20 years, is clearly non-atomic.
But then, it we consider the arguments above, I think it is clear that the relational model is already handling non-atomic data so whether it really matters if XML is atomic or non-atomic is open to question.
Chris goes on to suggest that we can argue that relations can contain any type whatsoever. However he also says that an exception is that:
“…… no relation in the database can have an attribute of any pointer type. As you probably know, prerelational databases were full of pointers and access to such databases involved a lot of pointer-chasing; a fact that made application programming error-prone and direct end-user access impossible. Codd wanted to get away from such problems in his relational model, and of course he succeeded.”