back to article Hub and spoke gives data analytics a new spin

The idea behind data warehousing is simple: put historical summary data from back-end transaction processing systems in a machine designed to answer queries fast. The online transaction processing (OLTP) systems run the business, while the data warehousing systems help managers understand the business and steer it better. The …


This topic is closed for new posts.

Hub and Spoke <> Analytics

The DW should contain application-neutral atomic data, with only the dependent delivery data marts (if they exist) containing application-specific summary/aggregate data to support query performance.

Queries don't all have to be answered quickly. The real DW value often comes from serving a small community of explorers asking iterative/complex questions, not from serving the larger community of farmers running KPI reports. Query elapse time is far less important for the explorers – “data scientists” in the current parlance. A good DW serves the needs of both communities, plus others.

For those that use a general purpose DBMS, on a standard “SMP plus SAN/NAS” platform, I would agree that "a single data warehouse to gather up data and do queries also happens to be impossible". For those not bound by such restrictions, politics and economics are the issue, not technology. This was discussed a few months ago on Curt Monash's excellent DBMS2 blog: http://www.dbms2.com/2011/06/21/its-official-the-grand-central-edw-will-never-happen/.

Appliances, such as those offered by Teradata and Netezza, are not an admission that the EDW concept is "no longer sufficient". Teradata has offered appliances for over 20 years, it's just that they've started marketing some of their offerings as appliances relatively recently, mainly in response to the rise of Netezza.

The phrase "data warehouse appliance" dates back 8-9 years and is attributed to Netezza co-founder and former CTO Foster Hinshaw: http://bi-insider.com/portfolio/overview-of-a-dw-appliance/.

A conventional DW is more than capable of loading and querying web log data in a good old-fashioned relational schema. The challenge is the complex transformation process between the web server and the back-end data warehouse. This is precisely the capability provided by Celebrus: http://www.celebrus.com/productindex.aspx.

Perhaps the most suitable technology for covering in a DW hub-and-spoke article might have been Microsoft’s PDW?

This topic is closed for new posts.