At IBM's recent Information on Demand conference (which was excellent, incidentally) the company presented its view of master data management (MDM). I am glad to say that this has advanced significantly since its Barcelona conference in May and the company has now recognised that you need to take a flexible approach to MDM. The …
Don't Overlook Oracle's Data Hub Product...
I disagree with the statement that Oracle's hub-based approach is assuming master data will be stored in a datawarehouse.
Oracle's Customer Data Hub product is an active repository NOT a datawarehouse. It is specifically designed to provide a master view of data for use by transactional operational systems not just analytics.
It's not just IBM that have a flexible understanding of master data management.
An Active Repository, whether Oracle or not, certainly sounds like a good place for Master Data. But I think that the real issue is defining exactly what Master Data is, and what it is used for, before deciding where to put it.
One definition I've seen is "master data is a set of data elements along with their associated attributes, such as customer, product, employee, vendor, etc." Well, in the bad old days, that was simply the customer, product etc databases in the transactional corporate database, plus (probably) bits of the Data Dictionary (aka "Repository") we drove this from.
If Master Data is something people have suddenly discovered, then I wonder how our banking systems (eg) have been working for so long (yes, I know, they use legacy COBOL and IMS databases, back from when we did this stuff properly).
So, coming up to date, if "Master Data" is always going to be treated as transactional, then a Datawarehouse is a poor place for it - keep it in a "real" database.
However if it not frequently updated, then the Datawarehouse is a reasonable place for it, as long as transactional systems can get at it as and when necessary.
And if people want to know how the Master Data changes over time, then it does probably belong in a Datawarehouse.
But there's a hidden assumption here - and I don't really see a Datawarehouse as fundamentally different from any other sort of database (it stores historical data, however, so it might get very big; and historically it has been optimised for retrieval rather than update).
Making a Datawarehouse a special kind of database is simply a result of technology limitations - there's no a priori reason why historical data and BI cubes shouldn't live in the transactional database, it's just that most current technology wouldn't cope too well with this (and DBAs, sensibly, are a conservative lot). As an aside, IMS could probably cope (to an extent - it'd need extra access functions), because you could physically partition off non-transactional data within a database - and modern transactional databases such as SQL Server are starting to catch up.
So, if Master Data is needed for transactional processing, it must be available on demand (with low latency) to the transactional systems; and if it is needed for historical analyses, it must be accessible to the BI analysis programs.
Where you put it physically depends on the characteristics of the technology you have available and the relative usage frequencies of the various access paths and their associated required latencies.
In other words, where you put Master Data isn't a generic vendor-driven decision, Oracle vs IBM, say; it's a matter of physical database design. Do we still do that?
- Xmas Round-up Ghosts of Christmas Past: Ten tech treats from yesteryear
- Special Report How Britain could have invented the iPhone: And how the Quangocracy cocked it up
- Analysis Microsoft's licence riddles give Linux and pals a free ride to virtual domination
- Massive! Yahoo! Mail! outage! going! on! FOURTH! straight! day!
- Bring it on, stream biz Aereo tells TV barons – see you in Supreme Court