We recently published an article on the advantages of evolutionary database design (EDBD), a process which has its roots in the agile/extreme programming world. To provide a little balance, some yang for the yin, we asked Mark Whitehorn to comment on the article and give his views on EDBD vs. the more traditional database design …
Having read the Scott Ambler article I didn't he was arguing for anything particularly revolutionary but the book may be different. I think that both he and Dr. Whitehorn are asking for a little more understanding for both sides of the application/database divide.
There are certainly more than enough programmers who seem to go out of their way to avoid writing schemas and queries, seeking the silver bullet of O/R mapping and generally valuing "data objects" over "relations". At the same time there are also DBA's who do seem to guard schemas as if they came down from the mount. It's not easy to changes this because data management is fundamentally different from application development. It is essential to spend time modelling before implementation as subsequent changes have a higher costs: adding a new method or way of *doing* something is easy but what how do you deal with objects previously created? this becomes even worse when you deleting attributes.
If done properly changes to a data model should not pose a problem but if major changes come too fast then they will stress the process particularly the migration between models and encourage the use of "shortcuts" which will obfuscate and impede subsequent developments.
Very interesting article
And if I might add, far better than the last :P
I have one complaint and that is regarding the "camps" mentality of EDBD and TDBD, sure most EDBD designers are application developers, I myself favor an EDBD design model, however I think saying that EDBD devs think of the DB as an inconvenient model of storage is wholly incorrect.
The main factors in EDBD are generally usability, functionality and agility. Usability of an application by proxy affects the design of the database and the design of highly usable applications and the methods employed in acquiring a high usability standard including things such as paper prototyping have a great influence on the design process, including getting the information out of the users heads and into some kind of logical model. In fact the kind of design methods used in usability testing often provide better requirements and models than simply discussing things with a user (mostly because users don't know what they really want), as they will see something which is similar to what they could/would be working with in the future.
It is at that point, when things are able to 'click' for the user and that turns the DB into an adaptable temple, which is less likely to come crashing down around you.
One final remark on TDBD is that a user interface as an afterthought has often led to difficult to use difficult to train database applications. Where data fields are thrown at the user sometimes in such large volumes with such bad groupings that the temple isn't going to survive for long.
In short sometimes looking at the DB from the users perspective, as in through the 'window' that they actually use has greater influence on good database design then attempting to gather logical models from individuals who are users and aren't well versed in the intricacies of database design.
Agile versus Traditional
The whole agile/extreme debate vis a vis traditional life cycle methods applies just as much to databases as to any other part of the overall system, including hardware. (Hardware may actually be harder to change than databases, after all.) I found both articles very thought-provoking, as Scott Ambler's work usually is, but I also found disturbing that my experience has been very different from theirs. To me, the primary issue I've had to deal with over 20 years of designing systems is dealing with badly designed databases already in production. Scott's points are very well taken with such systems, but his techniques (as with all agile techniques) may not scale well to more complex databases. I use agile and refactoring techniques every day (along with OO design patterns, another way to avoid bad design) and the problems with deploying anything but minor changes to a production schema are definitely there, especially for systems with many programs in many different languages sharing the database.
I also found it interesting that neither author mentioned the possibility that design could drive both processes. This is really a limitation of the ERD approach. I use Model Driven Architecture (MDA), the OMG standard for "code generation" (and so much more :) that uses UML as the "code" for the database (as well as for other parts of the system, such as the common data access layer and even the web application user interface subsystem). Using MDA avoids the decentralization issue by making the changes in the design artifacts, not in the code artifacts. You could probably build Ambler's refactoring techniques into the generation templates to enable a much cleaner production process.
It would also be nice if the relational database community stepped up to schema versioning in a formal way as the OO database community did years ago (not that it helped them in the market).
What I'm most happy to see is that people are talking about these issues; they are certainly where my teams have spent most of their lowest productivity time. Keep talking about it and find solutions!
Re: Agile versus Traditional
For information - I think you are right that the relational community hasn't done anything about schema versioning. However some specific company's have done so with individual - check out Data Dude from Microsoft for example.
I realize this is SQL Server specific and I'm not trying to sell this as a good or bad product, but it is interesting that tools are emerging for schema control.
EDBD Vs TDBD
First of all, let me introduce myself... I am an application programmer (well, now I am an analyst, but I still get my hands dirty ^_^ ). I am *very* good at writing applications for database, large and small, and at extracting the last erg of power out of the table designs and SQL that run on it. Why? Because I also took every DB-related class available at my university - I *know* how a DBMS works, I know about Z-sets and Optimising tables and that while perfect optimisation is desirable in theory, it is never advisable in practice.
For this reason, I am able to straddle the EDBD and TDBD camp - Yes, there should be a central repository of the database schema with people who make sure the schema is still correct for the OVERALL business. However, life (and project development) is made much easier when your programmers understand database design and can make educated suggestions to the DBA teams. I have never had a schema change rejected (though I did have one returned because of a spelling error in one of the column names.)
Funny thing is, I do not think of myself as exceptional - as far as I am concerned, you should NOT be working as an AP/DB unless you unserstand both sides of the equation (you don't have to be an expert at it). It annoyes me to no end to have to train "professionals" who turn up and tell me flatly "I'm an application programmer - I don't need to know how the database works, that's the DBA's job". Idiots. That's like saying you don't need to understand the basics of how a car works to drive it - let's see how smart you are the first time your car stops and your "understanding" is so lacking you have no idea that you've run out of fuel, what fuel is for, and how to resupply.
Long rant. Sorry. Sore point.
One last thing before I post this - I know ER is the preferred methodology out there, but I personally find it lacking... it implies that the person creating the diagram is good at figuring out what should go into an entity based (mostly) on gut feel. Personally (and I admit it's a personal choice) I tend to use ORM. It is based on each individual piece of information (ie, columns) and by the time you've finished your diagram, the *table* designs naturally fall out of it. It's not a very popular methodology, I admit, but it works for me.
Evolution crap - Traditional crap
The jargon-fest of evolution again? The theory of biological evolution applied to a field of IT . I despair! The theory of tradition applied to a field of IT. I despair!
Do these nonsensical forced analogies-cum-paradigms actually help an individual involved in solving problems?
I'll use use whatever that is to hand to help me analyse the problem domain, to understand the problem domain and to lead me to a practical solution to any particular problem of that domain. One of the most problematic problems is being aware of what is to hand. Another is evolving an awareness of how you yourself learn.
The theory of tradition - I do not have to think.
The theory of evolution - unpredictable small changes, unpredictable big changes - the change may have a chance of having a future.
Tradition and evolution have one thing in common; no analysis.
Modelling is about the organisation, not developing applications
As a senior DBA, who specialises in modelling and design, I have encountered many situations where "evolving" database designs based on developers' requests have resulted in extremely poor data integrity and consistancy. Unfortunately, I have also witnessed this becoming more prevalent in shops as the development team feels a need to produce results and show working prototypes as quickly as possible. It is not uncommon at all to have developers start coding long before analysis has been completed. The few projects that were allowed to have a significant amount of analysis done prior to the start of coding were amazing successful. When the developers finally got there hands on the databases, they were amazed as to how all their requirements were fulfulled in the design. The requests by developers for changes were very minimal, and those that were needed as a result of something missed during the analysis phase were easily incorporated. Properly constructed and QA'ed ERD models will reflect the rules under which the organization operates. ERD's and database schema's do not represent the processes that are followed. Database design lead by developers tend to be focused on what is displayed on a screen or report. This makes it very easy for them to retrieve and save data to the datastore, but is disasterous for the lifeblood of the corporation, the data. I have not come across many organizations that realise this before it is to late, the system has been in production for a number of years, and the integrity of the data has been compromised. One thing can be assured is that no matter how great the interfacing application is, if the data quality isn't there, the application will not be successful. While it may be a hard sell to have the Data Architects model the organization without an associated development project, it would be extremely helpful in the long run. Capturing the business rules and relationships independently of coding is one of the better ways of ensuring that models truly reflect the business.
That being said, during the design of the model, it should always QA'ed by numerous different people and not just those that were involved in its creation. It is not uncommon for the few business analysts working with the modeller to be unaware of aspects of their business that may be crucial. A quality model should allow me to sit down with a business client and tell him/her exactly the rules under which business is conducted. It always seems to amaze the clients when you walk into a room cold and explain to them their business just by walking through a quality model.
As for which methodology to follow, I prefer relational modelling and ERDs due to the stringent maths behind relational theory, conciseness of the models, and the ease to which they can be tested.
Obviously, everything can be for nought if the people involved in the analysis and creation of the models are not thorough or do not have sound understanding of relational modelling. Then the result is the worst outcome you can have, a bad model and a resulting database that the developer's cannot use. I have seen a few organizations where they deemed it necessary to employ Data Architects but then staffed the positions with people who were experienced with the business instead of someone versed in modelling principles and techniques. This can lead to some serious analysis issues as the person who "knows" the business will model what they know, not necessarily the business.
So, while Agile\Evolutionary development may be fine for applications, rarely will it result in a good model of the business and sound database design. It is too hard to find people with modelling skills to develop and maintain the models even when backed by solid analysis. To expect the all developers to have the skills and abilities to "evolve" good database design while they "evolve" their applications is extremely optimistic.
I have to agree with Simon Pamenter
When I got into this business 30 plus years ago, there were very few TLA's (Three [sometimes more/less] Letter Acronyms). We understood the business, the data, how the data was used, and how computing systems worked. Some of those databases and applications that I wrote long ago are still running. What is the most important is that the people who are doing the task (what ever it is) understand what they are doing and how its going to be used.
- Product round-up Ten excellent FREE PC apps to brighten your Windows
- Review Tough Banana Pi: a Raspberry Pi for colour-blind diehards
- Product round-up Ten Mac freeware apps for your new Apple baby
- Analysis Pity the poor Windows developer: The tools for desktop development are in disarray
- Chromecast video on UK, Euro TVs hertz so badly it makes us judder – but Google 'won't fix'