Teradata, the big daddy of data warehousing, has finally responded to the appliance challenge by rolling out a new family of systems that cater to customers with various budgets. Rather than shipping a single, beefy system, Teradata will now sell a range of gear that starts with the 550 SMP (symmetric multiprocessing) on the low …
Lets do the math
Lets see, 1 TB from them, $67,000
1 TB internal SATA drive, Less than $250
Of course, I realize that we aren't exactly comparing apples to apples here, but any semi-competent engineer can build one heck of a data storage unit around inexpensive drives.
They are going to have to bring their prices down significantly before the majority of businesses can even consider using their applications and hardware.
Of course, on the plus side, that leaves quite a profit margin for for some enterprising startup to leverage and fill the void.
So it's got xeons and terabytes
but what does it actually do? Data warehousing is loads more than chips & disks. I repeat, why pay for their hardware when I can get the same for cheaper elsewhere? *what do they add?*
An SNMP agent and some Java monitoring software and pre-qualified Seagate model SE16 disks LOLz!
@E, @Lets do the math
That's not fair. There are bits other than fast disks that matter, such as a decent, fast disk subsystem and whatever else needed to get info to flow fast around the system. That's the difference between a PC and a server.
Probably a decent, multi-processor-licensed, commercial sql server thrown in as well? dunno
But that still leaves a whole lot of money still to account for. Teradata, I'd genuinely like to know.
What do they add...
Well, try an optimiser that almost always come up with an excellent query plan, no matter how complex the SQL...that's MUCH harder than it sounds.
Try unconditional intra-node parallelism - 10 or 12 virtual processors, each owning a virtual disk, running on a single SMP node to tackle each query in parallel.
Try automatic table space management, no matter how big the system.
Try linear scalability, certified to >1,000 SMP nodes.
Big banks, telcos and retailers have been using Teradata for decades because "it just works".
@What do they add...
I argue ("that's not fair") that you had to be adding more, hence my question. I don't think you've answered it though. Step by step:;;
> Well, try an optimiser that almost always come up with an excellent query plan, no matter how complex the SQL...that's MUCH harder than it sounds.
* I know how hard it is, and in general it's totally impossible. I can give you simple sql that I'm sure you cannot automatically optimise (I discussed the example in email with Hugh Darwen and he agreed). And if by some magic you could, I guarantee I could find you one you could not. And it would not be large either.
But from experience I know MSSQL can produce excellent query plans for some of the most horrid SQL I've ever seen. So buy MSSQL and drop it onto your stock hardware. NB. I don't work for MS.
> Try unconditional intra-node parallelism - 10 or 12 virtual processors, each owning a virtual disk, running on a single SMP node to tackle each query in parallel.
* Why not use real processors? Why virtualise the disk - all you risk getting is simultaneous reads fighting each other for access to the real disk. At that price you could use real processors each running a disk cluster. In other words, a roomful of bog-standard servers.
And MSSQL can run nicely on an SMP multiprocessor, doling out work to each core as necessary.
> Try automatic table space management, no matter how big the system.
* yeah right. Big DBs need big management. You may provide remote DBA time as part of the package, but that's not quite what you've described.
> Try linear scalability, certified to >1,000 SMP nodes.
* WTF are you doing with that much processing power? And how much would it cost? and how many big (yet bog-standard servers) could you buy and shove in a big room for the price you quote?
> Big banks, telcos and retailers have been using Teradata for decades because "it just works".
* hmmm. And again hmmmmm. Given how brilliantly bankers have recently proven to manage trillions of pounds of assets, says loads for their judgement.
But these days you can buy *big* stock hardware and run big DBs on it, with (what I understand to be) decent analysis tools. And analysis of data warehouses tends to be on large static snapshots, so it can be distributed freely, usually nightly after updating, and processed by multiple different boxes simultaneously. So what are you offering?
I'm afraid you haven't answered my question. And I'm not trying to be destructive, I'd really like to know.
Well, that's the enterprise hardware market for you. Large companies will joyfully spend like 10x (or more) the cost of building a system for... well, I don't know what for. Everything I've read about Teradata is good, but in general color me cynical about this stuff. It seems like a lot of these big companies will buy a pre-made enterprise product to avoid the costs of doing it from scratch, only to spend more time and effort "integrating" or "customizing" the product than they would just building from scratch. My suspicion is the way many companies are structured, the red tape simply makes it impossible to do this stuff in-house.
Enterprises do pay a lot for hardware, there is no doubt about that. Warehousing is probably of most benefit for large corporations that have large independent systems servicing different requirements of their customer base. Eg Banks with different personal loan and housing loan systems, Telcos with different pre or post paid mobile provisioning or billing systems etc The result is a need to have a system that can load, store (and keep history) plus relate/match all of the data from these different systems.
My understanding of why Teradata can't be built in house is because it has a component of proprietary hardware which balances the load sharing.
I think it's a different perspective to MSSQL doling work out on an SMP machine to different cpu's. Teradata uses the proprietary hardware (and software) to break apart the query and distribute the data across all the cpu's in the machine - so that each cpu is only looking at information that is a subset of the overall join being performed at that point. This means that for whatever fraction of a second it takes, each join is run by whole machine. Then for the next join the data is transferred or redistributed across all cpus, and repeat.
Because Teradata redistributes the data around the machine, you don't have to be picky with creating indexes on tables (apart from initially specifying which column(s) are used as a basis to distribute the data across the nodes). This helps if you have a big database but can't predict in what way people are going to query it, ie what columns are they going to join tables on, or have lower skillsets running analysis and writing queries that don't match expectations at design time.
Re the processors being virtualised is something to do with allowing load sharing to be tuned, presumably between i/o and cpu.