Data Warehouse customers
Aren't they supposed to be using PDW rather than SQL Server?
It's 1996 and Mission:Impossible has just arrived on the cinema screens. RAM is $10 per megabyte and falling. Against this backdrop, Microsoft has quietly commenced its own seemingly impossible mission, a complete re-write of a little known database which will be released two years into the future, and will be known as SQL …
Aren't they supposed to be using PDW rather than SQL Server?
The Parallel Data Warehouse applicance is a data warehousing applicance based on SQL Server. There's a short, high level overview article on TechNet at:
Well Matt 21, let me help you here..
"Microsoft has quietly commenced its own seemingly impossible mission, a complete re-write of a little known database which will be released two years into the future, and will be known as SQL Server 7."
What you mean Sybase? Little known and failure, Did you research Sybase at all?
"Redmond's new database will compete with the entrenched kings of the database world, Oracle and DB2."
SQLServer 7 didn't in any way compete against the existing Oracle and DB2 installs. Only recently has it blipped on the radar, but only recently.
"It is time to re-think why data is stored on slow, unreliable disk, for anything other than recovery and archive purposes."
You mean floppies right? Cause Hard Drives with their various redundancy lead controllers (like using RAID perhaps?) seems to be the best long time storage available for reliability... Certainly this is NOT the case for RAM (not long term storage) or FLASH (not as reliable).
"One table at a time migration is practical, reducing risk without requiring expensive consultancy or new skill sets. All common data types are supported - only the less often used XML, CLR and max() types are excluded. "
But there are some glaring omissions aren't there? Like no Foreign Keys and Check Constraints, or how about INT IDENTITY.. are they in there as well... Nope, not yet!.
Hekaton is a great technology, but I think the author is seeing MSSQL through rose tinted glasses.
"but what's not in dispute is the fact that Microsoft has clawed out a third-place position behind IBM and Oracle (number one) who have been going at this for longer."
Erm, No. SQL has been #1 for about 2 years now (by licensed users):
The number of licenses won't give you a clear picture here. MSSQL is vaslty cheaper than Oracle, so it makes a lot more sense to go with MSSQL for cost reasons. BUT, you will still see large organisations plumb for Oracle/DB2 for their massive databases (actually, I see mainly Oracle here).
So it would be interesting to see how many massive databases (I won't define massive, if the number of tables and rows make you think "oh fuck!" then that would be massive) are actually run on MSSQL compared to Oracle (and maybe DB2).
Thanks that's a good start!
"Furthermore, in contrast to its competitors, all tables or one table at a time can be migrated."
Nope, Sybase can already do that and has been able to for seven years or so. I'm sure it is not alone.
Perhaps I was a little harsh in my first comment but it does read to me as if it has been written by someone with little experience in the DBA world.
"32-bit Windows coped with at most 3GB of usable memory,"
On the desktop. From XP SP3 onward. Because they nobbled it.
Server versions can go up to 64GB.
"In 1996, twin core CPU systems were a rare novelty and hugely expensive. "
Hmm, multiple CPU systems were quite common though and all the sites I worked with were on 4, or 8 CPUs with a few richer companies on more. From an RDBMS point of view there's not much difference between a core and a CPU in terms of how the RDBMS is designed and coded.
" Row-versioning and time-stamping is used rather than page locks to ensure ACID compliance, now a feasible technique as each row is stored in memory rather than disk."
Yeah, and that is why for example Firebird (and its predecessor Interbase and earlier) have had MVCC/row-versioning since the 1980s...
MVCC has its tradeoffs, it's not 'better' or 'worse'. And Interbase was a flaky POS.
And while I'm here, "ACID compliance, now a feasible technique as each row is stored in memory rather than disk" isn't the D in ACID rather meaningless in this whole sentence. That's what I meant when I called the article contradictory.
Not interested in getting into a flame war, I've been with SQL Server since 4.21a; the article refers to SQL Server 7.0 - Microsoft completely except for a bit in the parser I believe re-wrote the product. 6.5 was the last version with the 2K page size etc - the last version with the legacy Sybase code base in it.
Disk is expensive, the cost for IOps which we use to work out load requirements is extremely expensive when compared to flash - that is one of the points this article is making and one of the reasons why in-memory OLTP tables has been added into the product: note: not a different product like MySQL, Oracle and DB2 but they have engineered a solution into the main product - that has touched a lot of areas and also given benefits for normal tables too.
The point about INT and IDENTITY is just plain wrong; It's in RTM which came out this month - I've a demo on my laptop here.
FK's and CHECK constraints aren't in there yet no, but, can any of the other competing products choose specific tables to put in-memory (or are you forced to do the entire database), also, the other competing [separate] products can they mix joins between normally stored tables and in-memory tables?
Basically with the new Hekaton bits I can take an existing database, pick out an individual table I think might benefit from the in-memory bits (durable or non-durable at the table level unlike sybase) and put that in memory.
Like you I've not the space (nor time) to go in depth into this.
In terms of the competition, SQL Server has been around in the enterprise for a long time now, but cloud is where it's making it's mark because let's face it Oracle and IBM are [trying] to play catch up.
Can't speak for others but Sybase ASE in memory database can mix tables in memory with those not. Joins just work as normal. So yes, you can pick out one table, if that's what you want.
"Server versions can go up to 64GB."
Though that's only if you're running Enterprise or Datacenter editions, otherwise you're limited to 4GB on 32-bit standard.
That wasn't really my point. My point was that the authors seem to imply that MVCC is only now possible because of the switch from disk to memory, something which is clearly untrue.
I don't disagree with your comment on Sybase and SQLServer, the author said "complete re-write of a little known database". Sybase was/is not little known.
Disks are not expensive.. CMON! ... but they do not perform as well as flash, so I agree that. The point I was making is that I disagreed with the author stating "It is time to re-think why data is stored on slow, unreliable disk, for anything other than recovery and archive purposes.". Disk is not that unreliable, especially compared with flash.
Tony: "The point about INT and IDENTITY is just plain wrong; It's in RTM which came out this month - I've a demo on my laptop here."
I wasn't aware that was the case, According to the MS Documentation:
There are only a few limitations when creating memory-optimized tables, in addition to the data type limitations already listed.
No DML triggers
No FOREIGN KEY or CHECK constraints
No IDENTITY columns
No UNIQUE indexes other than for the PRIMARY KEY
A maximum of 8 indexes, including the index supporting the PRIMARY KEY
So if it has changed then I'll happily stand corrected.
Don't get me wrong, I think Hekaton is great but it's not a simple "oh.. lets just re-config the table to live in memory" and then SHAZAM, AWESOME PERFORMANCE and then DONE!! There is more to it than that
> and one of the reasons why in-memory OLTP tables has been added into the product
That's funny, mssql had in memory tables. DBCC PINTABLE in sql server 6.5. I think they were removed in 7.
To be more accurate, once read in the data was never paged out, but much the same thing I guess.
> Basically with the new Hekaton bits I can take an existing database, pick out an individual table I think might benefit from the in-memory bits (durable or non-durable at the table level unlike sybase) and put that in memory.
I'm being dumb, but can you explain how this is supposed to help? If a table access is hot, it's pulled into memory and, being hot, is kept there. If a table access is cold, it is better to be paged out by hot data than take up space while cold. I am missing something significant here.
"SQL has been #1 for about 2 years now (by licensed users):"
Licenses are such a valid way of counting actual users when the most popular database systems don't require them.
"On the desktop. From XP SP3 onward. Because they nobbled it."
Perhaps, but given some of the other addressing limitations, it's worth switching to 64bit anyway.
"In contrast to Oracle's TimesTen and IBM's SolidDB, SQL Server's in-memory database engine is fully integrated into the core product, and at no additional cost."
Why are you comparing in-memory functionality to TimesTen and SolidDB? IBM and Oracle both have DB functions within DB2 and Oracle to use in-memory: DB2 Blu and 12c in-memory option. TimesTen and SolidDB are more comparable to a HANA than a SQL in that they are entirely in-memory.
"Erm, No. SQL has been #1 for about 2 years now (by licensed users):"
Oracle and IBM do not license by users. Access would probably be number one if you wanted to use a "user license" count as everyone with Office has it. Oracle and DB2 are larger by the core count, queries performed, dollars, any other way you want to measure it. He is talking about market share which = dollars.
It is kind of like saying that Windows has a million % higher market share than Unix by server count. Lots of little Windows servers vs fewer large Unix servers.
Why is it, do you suppose, that Oracle became the primary choice over DB2? I know why MS SQL was not chosen for enterprise DBs, but DB2 doesn't seem to be missing anything that Oracle has and it is generally less costly. People usually, at least for the past decade, ran both Oracle and DB2 primarily on pSeries (Unix share leader), which you would think would be an advantage for IBM. IBM invented relational databases and was first to market. Not being sarcastic, seriously wondering. What was or is the silver bullet that Oracle has which DB2 doesn't have?
Thank you for your rather more constructive response.
You may be surprised to find I agree with at least some of your points. I completely agree, there are some glaring omissions, like no foreign keys or check constraints, although I think you rightly nailed it with "not yet". This is very much the v1.0 of Hekaton and I'd expect it to evolve and improve as time progresses, just as the other features in SQL Server have done. I'd not claim it was by any means perfect. However, given the purpose to which non-persistent data is more likely used (eg: for staging and analytics), and the lower requirement for the relational features to achieve those aims, I guess that for a first cut, Microsoft decided these features were of lesser importance.
I completely agree with you, Hekaton is a great technology, though yet to reach maturity. Likewise, I do view MSSQL through rose tinted glasses, as working with it from the time when it was not an enterprise class database platform, to where it is now as a world leader, I firmly believe it has taken its place as best in class, and will continue to wipe the floor with other relational platforms in terms of features and cost if Microsoft don't mess it up with the licensing.
Criticism has been with us since Aristotle penned his immortal, unforgettable, Poetics. Boethius (The Consultation of Philosophy) awoke us to the idea that criticism is an art form but it was without question Francis Bacon who transformed criticism into the finely honed Science that it is today. His great gift was to convince humanity that criticism is not merely opinion, not mere statement; it has of necessity to be based on evidence. This is now so much a fact of critical life that is seems unremarkable. Of course criticism has to be supported by evidence; how could we think otherwise?
From a modern standpoint it is impossible to understand how revolutionary Bacon’s ideas were in his day, but that they were right is beyond contestation. And the world was clearly hungry for his revolutionary ideas. As soon as ‘The Advancement of Learning’ was published, the concept of evidence-based criticism spread across renaissance Europe like wild fire. Even his tutor, the redoubtable John Whitgift (Archbishop of Canterbury) was forced to adopt the convention that criticism has to advance facts and be based on reasoned argument.
It was not just ‘a’ but ‘the’ turning point for criticism. And of course Bacon’s gift lives on to this very day, largely unappreciated because it has become so much a part of the fabric of our lives that we notice it no more than the air we breathe.
The great critics of today still faithfully follow in Bacon’s footsteps. Take, for example, one of the greatest Register critics, Matt 21. His most recent (but, as always, superbly apposite) piece begins, as Bacon’s often did, with apparent appeasement. 21’s careful “I don't wish to appear rude but” would have surely have evoked a wry smile and a nod of approval from the master.
Then, as we know he will, 21 changes gear with his ironic and iconic “I don't know where to start” – we aren’t fooled, we know that he knows where to start. He’s already done it.
His sublimely eloquent “I'm lost for words......” surely tells us so much. We know his work of old. We know that a flood of facts and reasoned argument is about to almost overwhelm our senses. We hang on his every word as he continues with “Oh, I give up there's so much wrong here that I'd need to publish a comment the size of the original article.”
What can I say? It is like drinking from the fire hose of reason. In that single sentence the facts are marshalled, the arguments levelled and, one after another, the bullets of logic are fired with deadly accuracy. It’s like a machine gun; no, that implies inaccuracy; it is like a sniper’s rifle on auto fire in the hands of a champion marksman. And after the burst is finished, there is only the smell of cordite and silence. Silence because there is nothing more to say. Those who already agreed with 21 settle back in their armchairs knowing the battle (nay, the war) has just been won. But our thoughts must surely go out to those on the opposing side. They must, at one time, have enjoyed hope. Surely they once treasured the belief that perhaps the public could be swayed to appreciate their point of view. But after that critique of pure reason there is no more hope, no flag around which to gather, there is nothing. There is no riposte for such eloquently reasoned logic.
So does that mean the desktop verisons of this (the old SQL Express thingies) will be called Waggleton P. Tallylicker?
We use Sybase in a performance environment where every database access is counted (and yes, we've employed solutions where the most accessed data is on flash), so Microsoft's improvements look good. Problems? It would mean migrating a lot of legacy software from Solaris to Windows, I think (suspect having the database software on a separate box would cause issues of its own). I think Windows is the single biggest barrier to Microsoft increasing its market share.
I'm not surprised Microsoft seems to have dumped replication. Sybase replication struggles at the transactions per second we require, and we often face problems external to us when dealing with geographic redundancy anyway (over African infrastructure). Of course, this is Telecoms, where Windows Server is only ever accepted for serving media (and Linux is eating into that). What we do is rarely elegant/often expedient. Still, if Sybase were to match what Microsoft have done, we'd find a use for these features.
You should look at Sybase In Memory Database option which launched some years ago. It's at least as good as what MS have done.
You might also want to look again at Rep Server as it normally does very well with high transaction loads. Have you tried replicating the SQL instead of the traditional converting of the log records to inserts/updates/deletes? Have you looked at the option to use bcp when it sees lots of inserts?
I believe we've looked at the various options. On the Sybase side, the insert rate is relatively low compared to the update rate, so we've never had to consider bulk loading. Ultimately, for us I suspect the best mix of risk and reward comes from constant monitoring and massaging - problems are often fixed before the customer realises they need to call us.
This article is clearly written by someone very enthusiastic about SQL Server, which is fine but when summarising new stuff that hasn’t been used in anger much a lot of important detail and caveats get lost.
A few specific comments:
MVCC has been implemented in Oracle for over a decade and so I don’t see how it’s a technique that’s only been enabled by storing data in memory instead of disk. Unless SQL Server’s implementation is different, it doesn’t eliminate locks: it just means that readers are not blocked by writers. Also the comment about blocking and locking being a major bane for DBAs is something that’s only caused significant pain for me when working with SQL Server. On Oracle, Informix and my limited experience of DB2 this is generally not the case. Others’ experience may vary of course.
The in-memory database sounds severely restricted and I can’t see how many existing applications would be suitable for a migration to it, even partly unless things like constraints don’t matter to you. In most organisations DBAs, at least production support DBAs, are unlikely to drive any change to this feature.
Column stores will always be better suited to warehouses. While having them in a primarily OLTP database can be advantageous, it is usually report-style queries that see the significant speed-ups.
Using SSD as a secondary buffer cache is nice, but many SANs effectively offer this already. Is this feature needed at the RDBMS level? The author is also in danger of giving the impression than in a typical OLTP system most reads go to disk (I am sure he must know this isn’t generally true). It is usually possible to get over 99% of disk reads from the database buffer cache.
About the author’s summary, all the database vendors are looking at what the others are doing and integrating similar features into their products. There isn’t a feature listed, save perhaps the secondary buffer cache, which is not available from another vendor today in some form or other.
Multi-Version Concurrency Control (MVCC) is indeed implemented without any locking, they use the CAS operator and time based versioning - versions remain in memory so long as an existing transaction (in Snapshot isolation) requires it.
SQL Server's implementation is indeed different - research BwTree.
The thing you are missing is that SQL Server is moving to the cloud, the cloud AWS and Azure certainly is not built on SAN, it's built on commodity servers with commodity storage with software doing the data replication for fault tolerance and distribution.
In the cloud space buffer pool extensions and in-memory OLTP can be a real help in mitigating the latencies with commodity spindle storage.
The industry is moving away from SAN's, certainly in the SQL Server space, that move is only going to accelerate in the years to come; look at what Violin have done - the embedded version of SQL Server runs as an appliance with their flash solution.
Thank you for your reply. Having read (most of) a Microsoft research article on BwTree and re-read the article here, I've realised that the authors are talking about the underlying implementation of MVCC which was not immediately obvious to me before.
Yes, it's worth reading about, the implementation is that rare mix of intelligent simplicity. Again, I'd point interested readers to attend Tony Rogerson's presentation on Hekaton if you're lookig for a more in-depth understanding of how things are implemented, it's fascinating and answers a lot of questions everyone has when they first encounter this technology, including the authors.
SQL Server revenue market share is growing. It is also fastest large database (check TPC benchmarks).
"SQL Server revenue market share is growing. It is also fastest large database (check TPC benchmarks)."
SQL Server's market share is I believe now at over 50% of corporate database platforms purchases, though I can't recall if this is by volume or dollar sales.
With regards to the TPC benchmarks, I think these are a good indicator of performance, though it's essential to look at those particular tests that match the workload that you intend to present to the platform. Ultimately, all of the major platforms are good, relatively stable pieces of software that have exceptional performance. However, a purchasing decision, even on a greenfield project, will be influenced by factors such as available skill set, legacy investment in other supporting technologies and the requirement for supporting specific applications or development requirements.
Although I pinned my colours to SQL Server's mast many moons ago, I think it's important to recognise it may not be the database of choice, even if it is the fastest, for all shops, no more than Oracle or DB2 may be. But from a production and operational cost point of view, I'd consider any one of these platforms in preference to a less mature platform - the licensing costs are often dwarfed by the installation, ongoing maintenance and overtime costs incurred in a less reliable product, a criticism I would have equally applied to SQL Server back in 1996.
In my opinion, where Microsoft definitely score bonus points is in the security of their product (not a praise it could once be considered worthy of), the tightly integrated development tools, and of course the ETL, OLAP and reporting products that ship with it out of the box, which can often be a costly additional purchase with other platforms. But equally I do think Microsoft have to watch their licensing policy as it's causing much disconcert and consolidation out here in the real world, and it's causing it right now.
-> “SQL Server's in-memory database engine is fully integrated into the core product, and at no additional cost”
WRONG, it costs a lot more unless you need the most expensive edition of SQL server for some other reason.
Point conceded, I personally deal with mostly Enterprise Edition installations, and we should have pointed that out. Thank you for your correction, noted.
"Data retrieval latency is orders of magnitude slower than memory. We're talking milliseconds compared to nanoseconds, a million-fold difference."
Good luck getting nanosecond latency out of the terabyte-sized memory mentioned in an earlier paragraph.
On a CPU running a few GHz, you'll get nanosecond latency out of your L1 cache. By the time you are hitting DRAMs or flash, the latency is more like microsecond. You've lost at least two of those orders of magnitude, maybe three. On the other hand ... that still knocks seven kinds of shit out of a disc and into a cocked hat. Back on the first hand, a decent disc cache subsystem will have delivered most of that performance already, even on DBs that are slightly too large to live entirely in memory.
So it will be interesting to see if this actually makes any measurable difference.
This surprised me. What is the justification for believing that the migration wizard will be improved?
PS: I won't say that no-one had ever heard of Sybase before MS bought the product. Just that it was a minority.
Perhaps the main reason for SQL Server's growth was its affordability.
Now, the same brain trust that forced TIFKAM onto a server operating system (Windows Server 2012) feels they can go head-to-head with Oracle on price. Many firms using SQL Server are in for an eye-watering surprise when they receive their next bill. Not a small increase, not a modest increase, but a jaw-dropping price increase. At least one shop I know of is working on replacing SQL Server with PostgreSQL; they simply can't afford to continue using SQL Server.
Sadly, these same geniuses (at Microsoft) didn't add value corresponding to the price increase - Hekaton, etc. are nice, albeit limited improvements, but are incremental improvements and are far from justifying the new (core-based) licensing model pricing.
Expect more defections as the new reality sinks in.
I have passed a kidney stone... the 2014 price increases bought similar pain.
DJ - agreed. I've seen clients suffer sharp price increases in their SQL Server estate and will do so over the next 12-24 months as they upgrade. It's still cheaper than the competition by a long way, but I'm very concerned that Microsoft will kill off the goose that laid the golden egg. I'm equally as critical as yourself of their pricing policy, and of the feature set that is being gradually eroded or stagnating in SQL Server Standard Edition.