back to article Building a data warehouse on parallel lines

Never look a gift horse in the mouth, especially if there are many of them running in parallel… There are various structures we can use in a data warehouse – each with its pros and cons. For example, if you use a relational structure for the core of the warehouse then you gain very high flexibility but lose out on speed. …

COMMENTS

This topic is closed for new posts.
  1. Anonymous Coward
    Anonymous Coward

    WX2 = White Cross MkII...surely?

    As well as not making readers aware that Kognitio WX2 is White Cross re-badged, I'm surprised GreenPlum (formerly Metapa) didn't get a mention. As a software-only play they are more like Kognitio than Netezza or Datallegro, surely?

    I'd would expect any system to be lighning fast if it is setup with only 100GB of data per node, as per the example in your article. Even at 'commodity pricing' what would 100 nodes cost to support a mere 10TB of user data? It will obviously depend on which servers are your favourite, and what spec is chosen, but it will run to a few hundred grand at least, and that's before the WX2 software cost is added.

    There's nothing new under the sun :-)

  2. David Norfolk

    Omissions?

    Sorry about White Cross - but it is Kognitio now, and your comment supplies the omission.

    As for Greenplum, I almost included a reference but decided not to - the article was really about Kognitio and I could have referenced many other products (probably even including Teradata). Also, Greenplum is covered somewhere in the related articles at the bottom of the piece.

  3. Anonymous Coward
    Anonymous Coward

    In-memory DBMS useful for DW?

    Why is Kognitio/White Cross even relevant in a world where 10TB is on the *small* side of modern warehouses?

    Didn't Greenplum and Sun announce a warehouse that *starts* at 100TB, and aren't Netezza and Datallegro following that lead?

    Who reads the register for DW news, people in small british only businesses?

    ;-)

  4. Mark Whitehorn

    Size, is it really important?

    Well, there are several answers to the questions raised about size.

    Firstly Sun and Greenplum did indeed announce a data warehouse appliance in July 2006. However it actually starts an order of magnitude lower than 100TB. To quote from the Greenplum press release of the time:

    “The Data Warehouse Appliance will be available later this quarter. Initial configurations will deliver usable database capacities of 10, 40 and 100TB.”

    Secondly, as far as I am aware, Kognitio has never made any pronouncements about the volume of data that either constitutes a data warehouse or should be put on a node.

    The quote in the article is there to illustrate scalability not absolute volumes. It says:

    ‘In addition the architecture that Kognitio has elected to use has a very desirable side-effect: scalability. The company claims, for example, that “the query performance of a 100-server WX2 system with 10TB of data will be the same as that of a 10-server system with 1TB of data.” ’

    There is certainly no inference that nodes are limited to 100Gb, the numbers are simply being used for illustrative purposes.

    Thirdly, and most importantly, size is not important. The volume of data held in a data warehouse really isn’t that relevant, it is the quality of information that can be extracted which can make or break the project and, indeed, the enterprise. That is not to imply that large data warehouses cannot provide important information; of course they can. It’s just that the correlation between volume and importance is not absolute.

    With reference to the other products, such as:

    Kognitio

    Greenplum

    Netezza

    DATallegro

    and others

    They are all doing very exciting work to push the boundaries of what can be achieved in data warehousing; we think it is important to give these products more exposure. In this article I focused on one of them but David Norfolk (my editor) and I were both keen to include references to some of the other products that are (as far as we are concerned) in the same space.

  5. Richard Gowan

    Indexes... blah...

    Indexing is the only way to speed things up? Rubbish.

    Try aggregates. Try partitioning - horizontal and/or vertical.

    If that fails, sack your consulting staff and re-engineer your system with a small bunch of competent techs.

  6. Mark Whitehorn

    Re. Indexes... blah...

    I don't think that anyone implied that indexes were the only way to speed up databases. There are, indeed, manifold techniques we can apply – indexing, aggregation, partitioning, OLAP (which typically, but not exclusively, includes an element of aggregation), hardware (a very broad church embracing parallel processing, faster disks, RAID, more and faster CPUs, more memory) query optimization, updating statistics, denormalisation (in all its flavours) to name but a few.

    Any and all of these can be appropriate, or not, for any given database; it depends on the circumstances. Aggregation can be an excellent solution but it isn’t perfect, see the Register article:

    www.regdeveloper.co.uk/2006/10/06/aggregates_the_dba_headache/

    Kognitio supplies yet another way to speed up databases: its solution is based around parallel processing and in-memory querying. The solution will be highly appropriate in some circumstances but no one would suggest that it is appropriate in all. The good news is, the more solutions that are available to us, the more likely it is that we can find an appropriate one.

    Ultimately there is no single button labeled ‘Make your database 100 times faster’. If there were, we’d all have pushed it a long time ago.

  7. Sean Jackson

    Keeping it real

    It's important to note that although index-based data warehouses have their part to play, so do those with a Massively Parallel Processing architecture. It's simply a question of what works best for the organization concerned and what the business wants to achieve. At no point can you generalize that one technology is inherently better than the other.

    Kognitio WX2 is not exactly the solution that people familiar with Whitecross Systems will remember. That was a solution based on proprietary hardware, WX2 runs on commodity, x86 hardware and thus allows users to get a Data Warehouse appliance-type setup but at a fraction of the cost and effort associated with other solutions. All operations are parallelized and the solution can scale up to whatever size is necessary. The point is that, with massive incremental scalability, there are no limits with WX2. Indeed, one user in California is currently running WX2 on over 300 blade servers and this is growing.

    The bottom line is that we believe we offer a viable alternative to other solutions, ensuring users, who are looking to gain high-speed data analysis from their data warehouse, can do so on a cost-effective based model. To prove this, we invite organizations to engage with us and allow us to build them a data warehouse for free. It may be that Kognitio WX2 is the solution you've been looking for.

This topic is closed for new posts.