back to article 12 simple rules: How Ted Codd transformed the humble database

Edgar – or Ted – Codd is one of the most influential figures in computing. Born 90 years today*, Codd – who passed away in 2003 – was the man who first conceived of the relational model for database management. Relational databases are today ubiquitous – on your PC, in your smartphone, in your bank’s ATMs, inside airline …

COMMENTS

This topic is closed for new posts.
Silver badge

There's much to be said on sql

and I was going to start, but then I realised I was missing the most important thing about him in the article:

"Personally offended by US senator Joseph McCarthy’s Cold-War Communist-baiting, Codd abandoned IBM and the US entirely in 1953 and went to work across the border in Canada"

23
1
Silver badge
Holmes

Re: There's much to be said on sql

The main thing to be said about SQL is that persons interested in learning about relational databases could do worse than check out Tutorial D.

2
0

wait, if wikipedia is wrong, and you have a reference to the correct infromation, you are supposed to update wikipedia and link in the reference.

2
4

"""you are supposed to update wikipedia"""

unless you can't be bothered. anyway, supposed by whom?

11
1
Silver badge
Stop

update wikipedia

Well, someone's done it now.

0
0
Anonymous Coward

Ted didn't merely transform the database

He normalised it (ducks for cover).....

15
0
Silver badge
Coat

Re: Ted didn't merely transform the database

He was the First to put it in its Normal Form.

2
0

NoSQL covers everything that is not SQL, not just key/ value.

Key value is just one model, others are graph (neo4j) and document (mongodb, couchdb).

So, nosql is a bit of a silly name, defining what something isn't, rather than what something is.

4
0
Silver badge
Angel

defining what something isn't

You mean, like GNU, WINE… …Bing…?

3
0
Anonymous Coward

NoSQL

Is not No SQL. It is Not ONLY SQL.

5
1
LDS
Silver badge

That is "NoRelational", not NoSQL.

The NoSQL crowd was a bit taleban calling their product that way. SQL is just a language, for the matter you can use it for non-relational data bases as well. And frankly I never understood their hate for SQL - after all it's a declarative language like most functional languages that are fashionable now, while SQL is not :).

I understand that RDBMS may not be the best choice for some types of data, but something Codd got right was that often data themselves are more valuable than the applicataions accessing them. Applications come and go, data stay. Decoupling fully the data (and their model) from the applications accessing them was - and is - a big idea and improvement. NoSQL is fast, but their data model is far less decoupled from the representation of the application(s) accessing it. If data have no value outside the application, good, but when data are more valuable than the application, and multiple applications need to access the data in different ways. well, RDBMS are still a good idea.

15
0
Silver badge
Trollface

Re: NoSQL

Ni SQL?

NI NI NI!

0
0
Silver badge

Re: That is "NoRelational", not NoSQL.

Well put.

The current love for NoSQL is due to the 'big data' idea, and ties into a previous Reg article about SANs apparently being not long for the world.

The thing that many comments there ignored was that a big Hadoop cluster tied up with a NoSQL database all running on distributed DAS is just one component of a full solution. That giant data-crunching platform requires data to be fed to it from somewhere and that somewhere may well be an application (or 20) running on . . . RDBMS - be it a web platform or an ERP like SAP.

In addition, to be any use, the output of those wonderful, distributed compute clusters must be somehow presented to the world and that presentation platform will, again, likely have some form of RDMBS running behind it.

Not to mention that those systems will require backup solutions which, again, may involve an RDBMS.

If NoSQL is 'taking over' from RDBMS then that really only represents the type of workloads that are currently being employed. Just remember that these 'big data' workloads are only possible due to the massive amounts of information being collected, processed and organised by other applications - applications which are quite likely to be relying on an RDBMS.

1
0
Bronze badge

You can store pictures, images etc. in a relational database...

...Whether you want to is a different matter though.

Relational databases are brilliant. It's just a shame that they don't scale well to the size required by a growing number of use-cases.

Here is a recommended guide to anyone who is a bit confused by the database options:

http://howfuckedismydatabase.com/

3
1
Bronze badge

Re: You can store pictures, images etc. in a relational database...

I've seen no evidence that SQL doesn't scale, quite the opposite in fact. I've worked on may SQL databases running over the hundreds of TB and scaling has never been a problem.

3
0
JLV
Bronze badge
Trollface

Re: You can store pictures, images etc. in a relational database...

Not only can you store all sorts of data, you can hang arbitrary attributes off objects.

For example, suppose you have a table with a growing, but uncertain amount of fields:

Table Foo (id, bar1dt, bar2varchar, bar3numeric...) where you don't know how many bars you will end up with.

Instead restructure it to hold its attributes in a child table: Table Foo (id) and Table FooAttrib (id, attrid, numvalue, dtvalue,varcharvalue).

Bit of a head-twister, but works surprisingly well if you need that kind of flexibility in your application.

Relational databases do have one very large Achilles's heel however, not mentioned here. It sucks at same-type hierarchies. Parts-of-parts or directed graphs. Say if you want to identify the relationships between parts of an engine. Or a manager-to-employees hierarchy. The standard-ANSI/no vendor-extension SQL to express that is typically hand-rolled, clumsily-expressed recursion, and brutal.

Graphs are also a big part of social networks and this is probably a big part why developers working on social networks, which are after all the most important apps ever (sarcasm), sneer on SQL's unworthiness.

That and Java devs typically can't write to SQL without an ORM doing all the hand-holding so that also proves SQL sucks ;-)

2
0
Anonymous Coward

The relational model was a brilliant insight. It's such a shame that Codd's two biggest acolytes, Date and Warden/Darwen (why he can't decide how to spell his name is a mystery), are such egotistical and dogmatic fruit loops. Reading Date in particular reminds me of the Soviet propaganda tracts I had to study at university.

2
2
Bronze badge
Coat

Re: reminds me of the Soviet propaganda tracts I had to study at university

You have a point, but reading Soviet propaganda tracts at university probably wouldn't have help me pass the DB design modules. (I assume you went a different route).

1
0
Silver badge

@ Chris Wareham

I have never seen Darwen spelt any other way. However a quick search gets me this off wikipedia "His early works were published under the pseudonym of Andrew Warden".

I don't believe they are egotistical and dogmatic. Date has strong opinions, then again his name is virtually synonymous with RDBs because he's done a lot of development on the maths behind it. I guess he has a right to hold those opinions, and it's your burden to show they're unreasonable.

As for Darwen, well, I corresponded with him by email over an issue (limits of sql optimisation and perhaps how to deal with them) and he was polite, considerate and gave his time generously.

I find I cannot upvote your post. Sorry.

8
0
Silver badge
Big Brother

Re: reminds me of the Soviet propaganda tracts I had to study at university

But Soviet database are always CORRECT by order of the party

2
0
Holmes

Chris Date was my instructor...

..on the various database models..he had to get pretty egotistical and dogmatic because of the "misleading" (aka pack of lies) FUD being spread by the Codasyl advocates. Chris came up with a simple and obvious Codasyl design, a simple and obvious program code example - no-one was ever able to debug it - if I remember rightly it was about 20 lines of code and contained at least 18 bugs. The fundamental problem that he was illustrating was that ring structures like Codasyl look nice in the abstract, but it is impossible to validate programs wrtten for them..

3
0
Silver badge

Re: reminds me of the Soviet propaganda tracts I had to study at university

But Soviet database are always CORRECT by order of the party

Technically, this algorithm is known as party-checking error correction.

4
0

"everybody understood the principle"

But not everybody can apply it, which is why thedailywtf.com exists today.

0
0
Anonymous Coward

Relational ignored in 1976

The 1976 book by James Martin (Principles of Database Management, Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1976) fails to mention the relational model. I didn't read the 1989 revised edition....but maybe relational got a mention by then.

0
0

This post has been deleted by its author

Silver badge
Holmes

Re: Relational ignored in 1976

Ok, in "Computer Data-Base (sic) Organization" by James Martin ("The most up-to-date and thorough guide to the techniques of data base organization"), 1975 by Prentice-Hall, ISBN 0-13-165506-X, (printed on excellent paper in black and pink) we read:

Part I (Logical Organization) Chapter 13: Relational Data Bases (pp 149-168)

"Data-base systems run the danger of becoming cumbersome, inflexible and problematic. The logical linkages tend to multiply as new applications are added and as users request that new forms of query be answerable with the data. A high level of complexity will build up in many data-base systems. Unless the designers have conceptual clarity they will weave a tangled web. It is possible to avoid the entanglements that build up in tree and plex structures, by a technique called normalization. Normalization techniques have been designed and advocated by E.F. Codd. ... The enthusiasts of normalization have a vocabulary of their own and a tendency to dress up a basically simple subject in confusing language. The table, like that in Fig. 5.3, is referred to as a relation. A data base constructed using relations is referred to as a relational data base."

There is also chapter 14: "Third Normal Form" ...

0
0
Silver badge

1) Gavin, you got a typo: "Turning Award"

2) Hands up who recognized the cover of the december 1972 issue of Communications of the ACM

1
0

Mainframe yes.. Website no.

SQL may have been a brilliant innovation for mainframes crunching large amounts of data in a secure room, but it is a diabolically bad choice for Internet use, since it allows malicious commands to be injected into the data stream, which is especially dangerous in data entered from online forms. From what I've briefly read about it, it's not too clear if NoSQL is any more secure in that respect.

0
12
LDS
Silver badge

Re: Mainframe yes.. Website no.

It's just bad programming habits - in any context - including mainframes. SQL Injection has always been an issue since SQL commands can be easily built at run-time and sent to the database - the Internet just magnified it, but you can inject in any bad-written application, web or not.

Techniques like bind variables, stored procedures, grants, etc. has been available for a long time as well, but too many web "developers" never learnt how to proper code an application using SQL (and design a database) - after all chaining some strings - without sanitizing them - is easier than declaring a variable, assigning the value and so on (or write a stored procedure and/or assign proper grants, let everything connect as the database owner!), isnt' it? It's vulnerable to injection, forces the database to reparse each statement. litters the code, but hey, it's fast and easy.... after all what is important is the site design, isn't it?

9
0
Silver badge
Trollface

Re: Mainframe yes.. Website no.

a diabolically bad choice for Internet use

This is like saying internal combustion engines are a diabolically bad choice for automotive devices because then people will crash them, drive while being drunk or text their friends.

6
0

Re: Mainframe yes.. Website no.

The problem of SQL injection attacks has nothing to do with the relational model or even its implementations (given that secure ways to bind data exist), and everything to do with amateurism in web development.

1
0
Holmes

DOB

There is a rather large relational database in the UK called the registrar of births and deaths. Codds Date of birth could be verified by a simple query. Nothing like a primary source for data integrity! On the other hand try the internet, that's never wrong is it?

2
0

"Some of the biggest and most profitable names on the computing scene – Oracle, IBM and Microsoft – are currently working on relational database management systems."

Odd wording there. It makes it sounds as if those companies, which are long time players in the RDB industry, are just now preparing their first products.

0
0

Or perhaps software at that level is an ever evolving process of patches, optimization, and new features which are major parts of these companies' ongoing business.

0
0

Codd OUTER JOIN NoSQL?

I had to read this twice before I realised there was no specific connection between Codd and NoSQL, despite what can be inferred from "On the eve of the anniversary of Codd's birth, NoSQL advocates will gather...".

Here I was thinking the father of relational had had some kind of deathbed conversion!

0
0
Happy

IBM loved relational

It's not true that IBM was late to the Relational party, it is just that it had to reach a quality threshold before it would be allowed out as a program product. System-R is what Larry copied with Oracle (complete with Rule Based Optimiser).. it was not until Oracle7 that Larry's DB had a Cost-Based-Optimiser to compare with DB2 2.

Far from cannibalising IMS, relational was criticised as a wheeze to sell more DASD (disk) & CPU, but the competitors could not argue with the mathematical proof of relational algebra & calculus in set theory: there is no information that can not be represented relationally.

For a very long time IMS was much more scalable than DB2 (you could even mount IMS in DB2), but only for one use-case.. choose the wrong one, and DB reorganisation was a killer.

1
0
Silver badge

Re: IBM loved relational

And IBM's System R, of course, is from the '70s. The System R project inspired Ingres (or more specifically, it inspired the use of a relational database for the INGRES project) and Oracle, as you noted, so Stonebreaker and Ellison were in no way picking up a technology that IBM was ignoring. Stonebreaker, in particular, made a tremendous contribution, but the article's narrative about IBM ignoring Codd's work is a fable.

Even excluding System R, which I think was sold only on a limited basis (as a PRPQ, maybe?), IBM had a commercial relational database in '81, which is a mere two years after Oracle. They can scarcely be said to have been late to the party. And it's not like the IMS cash cow needed protecting; IMS DB is still going strong.

Typical Gavin Clarke article, with ample technical errors. (Others have already pointed out such items as the bizarre mischaracterization of NoSQL as only key/value databases.) Sigh.

2
0
Silver badge
Windows

An interesting note found on a harddisk

In "The Genesis of a Database Computer - A conversation with Jack Shemer and Phil Neches of Teradata Corporation - IEEE Computer Nov. 1984":

This article gives the context of the The DBC/1012 system with two interface processors, four access module processors, and four Winchester disk units. When fully extended to 1024 processors operating in parallel, the system will be capable of storing a terabyte (trillion bytes) of data.

We read:

Shemer: Another factor [in building the database computer] was the relational data model - the fourth generation of database management software. People wanted it but could not afford it, nor was it practical. The reason was that it took a tremendous number of MIPS to deliver the functionality of a relational system. However, running the software on a mainframe practically relegated the big computer to the level of a personal computer. Consequently, the user environment has retained what I call the machine-friendly forerunners, namely the hierarchical and network database management systems that emerged in the 60's. These approaches were designed to process efficiently in single data stream machine environments, while the relational model admitted to parallel processing.

In the relational model, data is not explicitly ordered, since data items don't have pointers embedded in the data. Rather than traversing a family tree or hierarchy, you're dealing with rows and columns that represent the way most people like to view information. The relational system is synonymous with people-friendly; it's what people want, what the end user and the application programmer desire.

The big problem was to make the relational system cost- and performance-effective. The only way to do that was to provide a great many processing cycles at low cost.

...

Computer: It was an IBM scientist, E. F. Codd, who originally conceived the relational database model. What is IBM doing now?

Shemer: IBM has taken what I regard as a two-phased approach. On the one hand, it has IMS and DLl for the production environment. They use the hierarchical approach of the 60's, now almost 20 years old. IBM appears to be committed to that investment; it is telling users to keep IMS for high-volume applications. On the other hand, it has a new relational product called DB2 that is intended for the what-if query in the end-user environment. It is for the ultimate information user who may be a novice programmer or somebody not well versed in programming at all.

As I see it, IBM has effectively segmented the database world into two disjointed environments. It has essentially stated that the relational system it will deliver under DB2 is not efficient in accommodating production processing demands. In other words, keep IMS for account rendition, master file maintenance, etc., and use DB2 for what-if queries. It is a real dilemma for users. Moreover, this approach complicates matters. You already have an IMS database, let's say. To build a relational database, you have to have a utility program to extract information from the IMS master file. You now have two databases. What's more, they run on different machine environments, producing multiple versions of the truth. One file or the other is always out of date. Having two databases is a step backward, because one of the prime reasons for creating database management systems in the 60's was to allow multiple applications to have access to the same data. That data should have the same value at the same instant of time for both the production application environment and the what-if query environment.

1
1
Silver badge

Cool

Cool article. Cool bloke. Cool invention. I remember trying to get my head round the RDBMS notion in the 80s, helping Dad set up a database for work. It was call Smart or something, ran on DOS.

0
0
Bronze badge

One trivial example of "what's wrong" with the relational model is that you use a key value to reference an entry in another table, which requires looking that key up in the index table, instead of using a direct pointer. That requires more disk accesses, and is slower.

Of course, it's also much easier to update. But some tables aren't updated often. So it isn't a bad thing to offer the choice of doing a few non-relational things. That, of course, doesn't detract from the benefits of the relational model noted in the article.

0
1
Silver badge
Facepalm

thatisthepoint.jpg

To get rid of the unmaintainable mess of pointers.

And mixing the two concepts is just ... no.

0
0
Meh

Cluster Indexes address this

With a cluster index data is organised as part of the index leaf, so there is no additional access, but it’s not always a good idea because the cascading effects of moving rows from page to page can kill concurrency. Compare that to a network DB where it was common to pre-allocate max space to avoid moving data & killing concurrency (CHAR(100) instead of VARCHAR(100)).. what to saved in IO scans you paid in IO reads.

0
0

This post has been deleted by its author

Ted Codd told me when we were at a meeting of conference sometime in the late 70s that he had read a scholarly paper which had asserted that tabular data could not be relational, and that was why he named his creation the Relational Model.

1
0

Not so Simple

To someone approaching the relational model by learning SQL, it might superficially look like just 'data in tables', but it is not that simple. I recall being a meeting in which the DBAs proudly unveiled their schema for a rewrite of their company's customer and order database. It was fully normalized, they claimed, but it turned out that they had packed a bunch of repeating groups into strings. What they thought was being clever merely revealed the superficiality of their understanding, and this decision caused no end of problems.

BTW, I am aware that denormalization may be the right design choice, but that is beside the point here.

0
0
This topic is closed for new posts.

Forums