Mention "relational databases" and a few people's names might spring to mind: Oracle's Larry Ellison, thanks to his billions, or Monty Widenius, main author of the ferociously popular MySQL. Geekier types might plump for Oracle's former Dr DBA Ken Jacobs or open-sourcer Brian Akers, who helped architect MySQL. Michael …
nice iconoclastic article marred by typos.
some I noticed:
bulk load rather than single road
big-data offering Cassandra that has it's own take on columns
Interesting article - shame it appears to have been OCR'd or copied in from a hardcopy by a very tired hack? Weird errors in the text...
The last para is very true, availability is about processes and people. Almost all the examples I know of where a large (Well designed, redundant) system went down is when a person made a mistake. Having a service delivery organisation that is continually trying to improve their processes is key and pretty rare.
What? No mention of Illustra?
I'm curious why no mention of Stonebraker's Illustra.
Why its important is Mike Olsen.
Olsen is one of the founders of Cloudera.
"This particular elephant for a while but there are problems"
How true that is, even today.
"The young guys haven't seen it before and the problem with our computer science education system is the lessons from the past seem to get lost."
And now management training is down to 'get IT to fix it' ...
So, a couple of years ago Stonebraker publishes an ACM paper saying MapReduce sucks, suddenly his new startup application implements it and he's turned round to say "MapReduce is OK"? It may be good for press coverage, but it doesn't make any friends.
If you look at what Google and Facebook want, it isn't tables, it's graphs. I don't see this product helping here.
"If you look at what Google and Facebook want, it isn't tables, it's graphs. I don't see this product helping here."
Maybe that's why he isn't trying to sell it to them ?
So many type-ohs...
There's this new thing they just invented called proof reading - if you'd like to employ me to do it I'd be happy to, if it avoids having to do it anyway when reading an article. Yeesh, you guys!
More on ACID
Also, many applications don't need high integrity levels. If some obscure website disappears from the Google Index, not much damage is done. It will be back on the next indexing run.
If your bank account disappears, it is a completely different story. If your 5000 dollar health bill disappears, your insurance company will be hurt.
Banks and similar companies need ACID. Google doesn't. Facebook doesn't. Fartmail.com doesn't. MyDiscussionForumAboutDancingMonkeys.de doesn't. That's why the crappy and lightning-fast MyISAM storage engine of MySQL was so successful. Many people simply don't need the safety of ACID.
Until such time as any of them start processing transactions... oh, hang on - Google Checkout perhaps? Facebook adverts maybe.
Even a discussion forum about dancing monkeys would be helped by a relational model - it makes (safely) deleting threads that much easier. How many forums (fora?) have you known "fall over" and loose big chunks of data because they're just sat on top of something like MyISAM - I can think of at least three that I use in recent years.
Nowadays, if you're using MySQL, there's not _much_ of a performance hit to using the InnoDB engine which allows for relationships. However - as with most things in life - there's the "horses for courses" proviso.
If you're using your database basically as unstructured document storage, there's no great advantage to InnoDB BUT there is an advantage to using MyISAM - FullText searches. Of course, you could hold an indexing system in InnoDB whilst keeping the documents in MyISAM tables inside MySQL. - then the only trick is keeping them in synch.
Your app really isn't all that.
> Also, many applications don't need high integrity levels.
Most of those aren't harmed by the overhead of a proper database either.
On the other hand, a proper standardized algebra with some built in safety and recovery features comes in handy for other boundary conditions besides raw performance. It's not just about speed and most apps really don't need to be fixated on speed to the exclusion of all else.
Skimping on the data engine can cause other issues in the future.
Almost too much information for 2:40am...
...but I'm glad I stuck through to read it all. Fantastic article with much juicy brain informations. More like this!
One Size Does Not Fit All Needs
SQL Databases are mostly used by small teams of business application developers who are not systems programmers and don't have time to become that. They are under pressure to deliver one of these half-baked in-house applications. ACID is wonderful because it protects these people and their employers from their own crap code.
On the other hand, Googlers even hack the linux kernel in a bid to extract performance and efficiency. Most of them can safely be called Systems Programmers. No wonder they created something special-purpose.
As soon as you realize Google is NOT one of these small crappy coprorate IT developer shops, you understand why they have different needs and use different approaches to tech strategy. Most corpo-developers need ACID and Java as they will otherwise destroy their employer's business.
I'll skip the ACID this time, thanks.
You're rambling, dear chap. It is perfectly feasible to use MySQL in the corporate / enterprise environment. It's not "crappy", just "lightning fast", and very reliable on a server with UPS.
Many enterprises, including my own, use transactionally safe databases for back office / batch processes which just don't need that safety. Not only that, they need ten times as much hardware to run at the same speed. That's a lot of money for nothing. They need DBAs to run the things since they're ten times as complex - also a lot of money for nothing. Then they break down and take ten times as long to restore.
A couple of times in my corporate life I've witnessed the pain caused by a disk failure to an Oracle database - mainly the amount of time it takes to restore the thing - all thanks to ACID. Not to mention the DoS attack masquerading as a large update.
With MySQL, if your program gets to the end of a large series of steps without error, then they all happened as written. Period. Job done: no Java, no ACID.
clever post. not.
Systems programmers do a different job than application programmers. Doesn't mean app programmers are dimwits.
There is "one" Linux, "one" Windows7 (ok, in reality, there are many variants). Writing an OS takes years and years, and yes, it better be a solid underpinning for whatever comes on top. Ditto for databases. It is rocket science, but those folks sometimes work on the same code for years as well.
Next time you get hired by a startup, why don't you wait for some years while the rocket science PhDs write up a payroll for you? Or while the same geniuses do a marketing report? Tell you what, it's just not a good use of their time. And, yes, they are clever, cleverer than me.
Second, just because you think that all database work is dumb ol' selects doesn't make it true. A lot of the ORM-mediated web 2.0 stuff is really trivial database stuff, true. But try writing a complex application, like a payroll again, on top of a RDBMS. Would you be that clever yourself? Apparently not, since you don't seem to care that much about the ACID-backed integrity of things like "the hours worked by Mr. Smith in January" not changing.
Speaking of which, I think ORMs are often a safety net for people who truly don't get SQL.
One thing that has really worked well for IT is the notion of abstraction. Yes, you can manage every nitty gritty detail of your business data low-level storage, even as you try to use that data. Or you can rely on a reliable back end to handle that storage correctly.
Should I type more slowly?
But what ...
"if your program gets to the end of a large series of steps without error, then they all happened as written"
... if it doesn't ?
re: I'll skip the ACID this time, thanks
"With MySQL, if your program gets to the end of a large series of steps without error, then they all happened as written. Period. Job done: no Java, no ACID."
Trouble is a lot of us have to make it get to the end - while other things are processing and updating the same data, users are operating the gui and other systems are feeding back results in some irratic manner
if you don't need it...
It might be nice to have a system which allows you to turn ACID off. Either globally, or selectively.
Basically, if you are writing up a report, you might not care about too much, once you've copied in your input data to some processing tables. You probably still need some assurance that the source data you are pulling in was consistent, but that's about it.
Turning off integrity checks at the table level (calling stub functions for example) would do it and you could just replace the whole thing with a "geez, we know we had an error just now, go clean up your tables manually and junk this batch" exception.
If it wasn't too difficult to implement, it might be a nice feature to allow more reporting outta an otherwise std sql db.
But Stonebraker has a more general point that relational databases are unwieldy beasts in some/many contexts. Self-referential parts-of-parts (i.e. modeling say a spark plug's relationship to a carb to an engine, to a car) is a standard no-go area, mostly from the querying aspect. so i welcome this kind of initiative.
However... the NoDB/Stonbraker folks could very well repeat the fragmentation of OO-databases where RDBMs kept the crown because OO databases never really standardized. That would be a shame, despite my own pay being largely based on my sql skills.
"And as this pioneer from the past keeps working, he's come into conflict with those on today's leading edge – the NoSQL movement"
Sigh me a river. The only thing the NoSQL 'movement' are on the leading edge of is the NoSQL movement.
And FFS why does it need to be a movement ? I have no problem with some bunch of kids of coming up with a solution in their particular problem domain and being chuffed about it. Look at this, isn't it neat ? Sure kid, have a dime bar.
Where it starts to go horribly wrong is where the happy shine in their geeky little eyes becomes glassy and evil and they run around like peasants with torches trying to burn everything that isn't their solution and shouting "THERE CAN BE ONLY ONE PROBLEM DOMAIN!" and completely missing the point.
Lets face it, when Twitter's data store craps on itself and dies, a million asshats don't get to find out that Stephen Fry just had a dump and no one who matters gives a fuck. People doing enterprise OLAP have a fundamentally different relationship with their data and a completely different attitude regarding it's value and what trade offs they are prepared to make to achieve their goals.
But what the hell, MongoDB is web scale : http://www.xtranormal.com/watch/6995033/
Mention "relational databases" to me ...
and the few peoples' names that spring to my mind are Codd, Date and Darwen (or Warden as he also calls himself).
Google Dumps MapReduce
Bill McColl of Cloudscale on Google dumping MapReduce to go realtime
I sent in a (somewhat) lengthy piece on the lack of grammatical integrity displayed... and it's gone. Retcon much?
There's nothing new under the sun...
No mention that Postgres was a follow-on to Ingres, and was much more successful, although let's not forget that Ingres-powered Datallegro was acquired by Microsoft...and may one day re-appear as SQL Server parallel edition.
"Stonebraker finally thinks it's no longer a "one-size-fits-all" world".
How could one size fit all? Teradata is 25 years old. It would have died at birth if a general purpose DBMS could be used for OLTP and BI. Want to see a system that loads 20 billion CDRs a day? It won't be general purpose.
Scientific data may not lend itself to row-based relational tables, and scientists may not need ACID-compliance, but that doesn't really signify the end of the DBMS or SQL now does it?
"Postgres is no good at the data warehouse market because the science market wants arrays, they don't want tables. But arrays are impossibly slow on top of tables - Postgres has arrays but if you they were supported by blobs so weren't first-class citizens," Stonebraker said.
Netezza is worth more than $1bn and is Postgres-derived. The 'data warehouse' market and the science market are not the same thing. Of course arrays on top of tables are slow, but that's a bad implementation strategy on behalf of Postgres, no more, no less, and Postgres is not the only game in town.
"I learned if you want to advance the data warehouse market and want to go fast, you need a column store, not a row store".
Yes, if you want fast read-only queries. No, if you want fast bulk loading and update/delete capability. Take yer pick. BTW, some products like Greenplum offer both choices.
"Stonebraker reckons the relational staples such as logging, locking, latching, and buffer management that have helped pioneer and maintain a crucial feature of databases – data integrity according to the atomicity, consistency, isolation, durability (ACID) – have also become its biggest burden".
Gerneral purpose DBMS products such as Oracle will never shake off this legacy to position themselves for high-end BI as a result. Teradata know this, Netezza know this. Most clued-up DBMS folks know this. Move on.
ACID compliance matters to some folks and not others.
"Of course, Stonebraker is more than just an MIT academic. He's part businessman. He's co-founder and chief technology officer for VoltDB, and a co-founder and a board member of Vertica. That makes this more than a battle of architectures. It's a fight for customers' dollars."
True, which is why I would always question the article's motivation. Is it commercial?
"Increasingly, their answer to high-end relational processing is to boost the software by fusing it with the underlying hardware."
True enough, but until they can genuinely scale out they'll ultimately always hit the diminishing returns associated with SMP scale-up. Chucking more tin at a general purpose DBMS in the hope it will somehow shake off 20+ years of legacy is not gonna work.