It may take a while but eventually any good technology embraced by large enterprises trickles its way down to small and mid-sized businesses in some appropriately modified and re-priced form. It will be no different for modern business analytics tools. The time could be ripe for mid-range customers to start thinking about either …
Big? You Sure?
Unstructured data hasn't proven a business case beyond Google. It certainkyhasnt proven its worth anywhere else. It's rather disengious to believe the SMB sector; which makes up the bulk of most economy's would even bother.
How does 'big data' bebefit anyone but the advert brokers? I'm not sure it does...
Re: Big? You Sure?
True, I am not exactly sure how using unstructured analytics to determine, for instance, what people are saying about your company on this website does much for companies. Then again, I am not sure how placing online ads does much for companies in the first place. Marketing and advertising is all about hype. The perception becomes the reality. Marketing people will want "big data" analytics because people are writing about "big data" analytics.
Why do people buy Teradata these days? Not slamming Teradata, but a serious question. It seems that IBM and, if you like lock-in, Oracle have mirrored their OLAP appliance and then some. They used to be the only game in town, but I am not sure why you would go with their costly and proprietary appliances today. What is their secret sauce?
Tim can wax a bit lengthy on what essentially are collections of sales pitches, and at the end I always wonder whether I've wasted my time or not. Unless I've stopped reading halfway through. But I digress. This time, there's a notion that big companies in yurp only count as midsize companies in the yoosah, data wise. 'merkins just munch that much more data? What gives?
The minor lightbulb moment was that the BI crowd tends to stick to spreadsheets, which merely means they haven't brushed up on basic IT skills, opening up a lucrative market for data appliances. But what if you did do your homework and asked the IT dept for an OLAP instance with enough backing store for all that data? Suddenly you can connect with data viewers, visualisers, even statistical and mathematical packages and whatnot else.
Basically FOSS versions of everything can be found around the 'net, too, so all you need then is the hardware, and you let the IT dept handle that. What's wrong with this picture? Is business in general that dysfunctional? Rather give someone else big sacks of dosh for a magic solve-all box than do your own homework? Doesn't sound very intelligent to me. What is the point?
...is apparently defined by volume (how much), variety (what types) and velocity (how fast), or some combination of all three.
The term is in vogue due to the likes of Google, Yahoo and Facebook introducing the world to new analytic paradigms based on the MapReduce framework, open source software (Linux, Hadoop etc), commodity hardware and the notion of 'noSQL'...and also because the IT industry needs new buzzwords du jour. At the moment it's the turn of 'big data' and 'cloud'.
In theory, 'big data' as done by the likes of Google is all about unstructured data. In reality, there's a lot of structured data still out there, and I'd argue that all data has some structure anyway, so 'semi-structured' may be a better term.
Ebay has a multi-petabyte 256 node Teradata system chock full of structured data, in addition to the large Hadoop stack for web analytics, so there's clearly life in the old structured dog yet.
There's nothing new in 'doing analytics' - a lot of companies have regarded analytics as a competitive differentiator for a long time. There are companies out there, even in the lil' ol' UK, that have been using Teradata, which only does analytics, since the 1980's. I started my career at one of them.
For the typical mid-market company, if there is such a thing, all we ever tend to see is SQL Server on top of SAN/NAS. It's cheap, feature-rich, easy to tame and works OK until data volumes increase beyond a few hundred GB or so. The pain threshold is obviously dependant on the hardware, DBA/developer skill, schema and application complexity.
All SMP based databases suffer the same scaling issues, hence Microsoft's attempt to build an MPP version of SQL Server, (Madison/PDW), Oracle's Exadata and HP’s NeoView.
IBM in the BI mid-market is not something we see very often. Netezza Skimmer has never been sold as a production system before, as far as I know. IBM's own web site describes it as for 'test and development. A proprietary IBM blade based system running Postgres on Linux is hardly a good fit for the Windows/SQL Server/SAN/NAS/COTS hardware crowd.
Having said that, we did deploy a pre-IBM Netezza system as far back as 2003 for a small telco with only 100,000 customers, but they did have several billion rows of data and complex queries to support.
@Wonderbar1 - Teradata's competitive advantage consists of several capabilities...performance, scalability, resilience, functionality, maturity, support, 3rd party tool integration (e.g. in-database SAS), ease of use, applications and data models to name a few. It’s a true ‘full service’ offering.
Teradata is the only database built from day 1 (in the 1980's) to support parallel query execution using an MPP architecture across an arbitrary number of SMP nodes all acting in tandem as a single coherent system. That is very, very hard to do - ask Microsoft, Oracle, HP or IBM.
Overall, Teradata 'just works'. All those big name users can't be wrong.
The Teradata secret sauce for me is the scalable 'bynet' inter-node interconnect. This is used for data shipping between SMP nodes in support of join/aggregation/sort processing. The bynet is scalable and resilient and 'just works'. It also performs merge processing for final results preparation.
Other MPP systems typically have a non-scalable interconnect bandwidth consisting of a dumb bit-pipe. Even worse, those that ship intermediate results to a single node for final aggregation/sort/merge processing can hardly claim to be linearly scalable. Some Exadata clusters run tens of TBs of RAM on the master node to address this issue.
Teradata's bynet has processing capability that enables final merge operations to be executed in parallel in the bynet interconnect fabric without landing intermediate results in any single place for collation. Cool eh?
See here for more info: http://it.toolbox.com/wiki/index.php/BYNET
Teradata consists of OEMd Dell servers running SUSE Linux and dedicated storage from LSI or EMC. Teradata was historically regarded, quite rightly, as 'reassuringly expensive', but the launch of the new line of Teradata 'appliances' a few years ago has made Teradata price-competitive with the likes of Netezza, thus eroding Netezza's disruptive pricing model. Competition is a healthy thing etc.
Appliance adoption has been a key feature of Teradata's strong performance over the last few years, as reported several times on El Reg.
Have you ever run an Oracle query across a 20 node system running hundreds of virtual processors all working together? I did a few minutes ago - a 250m row count(*) in under 1 second with no caching, no metadata, no indexes, no tuning, no partitions and no concern for what else is running.
I can't remember when I last submitted a query to Teradata that either didn't finish or caused the system to barf. That happens a lot on Oracle/SQL Server.
The last project I worked on was a 20TB Teradata system that supports a very wide range of applications, including real-time loading of web data and several tables of over a billion rows. Total downtime for the year, including planned maintenance, is measured in single hours.
“But I could do all that with X, Y and Z”, we often hear. Off you go then. If you can get it to work, and that’s a big ‘if’, your boss won’t bet the farm on it. That’s another reason the likes of Teradata win business – it’s a safe bet for the decision makers.
Back to work…