Feeds

* Posts by Tony Rogerson

35 posts • joined 5 Aug 2011

Inside the Hekaton: SQL Server 2014's database engine deconstructed

Tony Rogerson

Re: A pinch of salt

Multi-Version Concurrency Control (MVCC) is indeed implemented without any locking, they use the CAS operator and time based versioning - versions remain in memory so long as an existing transaction (in Snapshot isolation) requires it.

SQL Server's implementation is indeed different - research BwTree.

The thing you are missing is that SQL Server is moving to the cloud, the cloud AWS and Azure certainly is not built on SAN, it's built on commodity servers with commodity storage with software doing the data replication for fault tolerance and distribution.

In the cloud space buffer pool extensions and in-memory OLTP can be a real help in mitigating the latencies with commodity spindle storage.

The industry is moving away from SAN's, certainly in the SQL Server space, that move is only going to accelerate in the years to come; look at what Violin have done - the embedded version of SQL Server runs as an appliance with their flash solution.

3
2
Tony Rogerson

Re: I don't know where to start

@ByeLaw101:

Not interested in getting into a flame war, I've been with SQL Server since 4.21a; the article refers to SQL Server 7.0 - Microsoft completely except for a bit in the parser I believe re-wrote the product. 6.5 was the last version with the 2K page size etc - the last version with the legacy Sybase code base in it.

Disk is expensive, the cost for IOps which we use to work out load requirements is extremely expensive when compared to flash - that is one of the points this article is making and one of the reasons why in-memory OLTP tables has been added into the product: note: not a different product like MySQL, Oracle and DB2 but they have engineered a solution into the main product - that has touched a lot of areas and also given benefits for normal tables too.

The point about INT and IDENTITY is just plain wrong; It's in RTM which came out this month - I've a demo on my laptop here.

FK's and CHECK constraints aren't in there yet no, but, can any of the other competing products choose specific tables to put in-memory (or are you forced to do the entire database), also, the other competing [separate] products can they mix joins between normally stored tables and in-memory tables?

Basically with the new Hekaton bits I can take an existing database, pick out an individual table I think might benefit from the in-memory bits (durable or non-durable at the table level unlike sybase) and put that in memory.

Like you I've not the space (nor time) to go in depth into this.

In terms of the competition, SQL Server has been around in the enterprise for a long time now, but cloud is where it's making it's mark because let's face it Oracle and IBM are [trying] to play catch up.

3
1

Fusion-io: Ah, Microsoft. I see there's in-memory in SQL Server 2014... **GERONIMO!**

Tony Rogerson

Lot of nonsense being said on here

Not surprised the "I love my product" trolls are coming out.

Anyway, some facts around this: SQL Server has and has always had a buffer pool, data is read off storage into the buffer pool (in RAM); buffer pool extension is simply a way of tiering data access performance within the confines of the server box i.e. you can add an SSD (yes, doesn't need to be a PCI flash card - I can demo it on my laptop), works in standard edition of SQL Server as well.

The "in-memory" comes in two flavours - OLAP (column store indexing) and OLTP which is the new stuff implemented with hash and range indexing. SQL Server is playing catch up, however, unlike other it's built into the one product, so I can in one product, in one SQL statement join a normal storage based table with an in-memory table (e.g select count(*) from mynormaltable a inner join myinmemtable b on b.key = a.key. Also, the in-memory is not a separate database, it sits within the normal databases we have today but as a separate file group.

We aren't limited to "integer" keys, you can have an index on varchar columns with the new in-memory stuff - the only restriction it needs to be a specific collation.

With SQL Server 2014, I can go round with my ACER V5 laptop (come and see me present in Bristol next Wed :)) that cost £500 and demo buffer pool extensions and the in-memory stuff; that differs entirely from other vendors where their in-memory databases are entirely separate products with some only working on specialist (expensive) hardware.

0
0

Violin array comes with new SQL sauce

Tony Rogerson

SQL Server 2014 Embedded

Probably worth updating the article to point to this link Chris: http://blogs.msdn.com/b/windows-embedded/archive/2014/03/18/microsoft-sql-server-2014-now-available-to-direct-oems.aspx

The SQL Server community in the UK is massive with SQL Relay, SQL Bits, SQL Saturday and dozens of regional user groups, the reason I mention, it's nothing new that Fusion-IO, Violin and other storage vendors have for the past couple of years been showing off these capabilities through talks, sponsorship etc..

The real story here is that SQL Server 2014 embedded is available for OEM, we've had SQL Server appliances for SQL Server 2012 for a while now but built around DASD made up by spindles; the flash move is something new. The Buffer Pool extensions to SQL Server have been introduced as a cost effective method of circumventing the significant cost of producing the IOps required from a SAN architecture.

Roll on the day hey - we are all on commodity storage in the cloud or embedded SQL Server in an appliance so we no longer have to move our data across 8 or 16Gbit connections from the SAN into the server (move the program to the data rather than the data to the program).

0
0

You... (Sigh). You store our financials in a 'Clowds4U' account?

Tony Rogerson

Age old comment - Business is NOT there for IT, it's the other way round - a lot of people in IT forget that.

Business cannot do without IT, but, it can do without internal IT if that internal IT wraps itself up in empire building, politics, procrastination because of self-preservation in specific technology areas.

Ever thought why one of the reasons to outsource IT isn't just money?

Lol - expecting a lot of flame for the above, but I'll not bite.

7
5

Google goes back to the future with SQL F1 database

Tony Rogerson

Yep. Consistency is indeed done via time which requires extreme accuracy. Atomic clocks are involved but it does depend on what you are doing.

The other thing is that it's semi-hierarchical rather than truly relational, that helps because they can group data thus helping synch issues on updates.

2
0

Don't let the SAN go down on me: Is the storage array on its way OUT?

Tony Rogerson

Re: DIY SAN

Absolutely spot on; if you look at what the main cloud providers - Microsoft and Amazon do in terms of their storage - that is exactly the approach they have taken.

Microsoft Azure is not SAN based, nor is AWS.

Hadoop is successful not just because it does unstructured data via MR but you can chop the data up, distribute it and have the processing work where the data is (locally) which is the exact opposite of the SAN approach where you move the data off the SAN to the server.

It's going to take another few years but the need SAN's met can be more easily and cost effectively met with software and commodity kit.

I guess once we are all on the cloud then the SAN V DAS argument will be mute anyway - because the major cloud vendors ain't SAN based! :)

T

0
1
Tony Rogerson

Yes - SAN has had it's place and certainly within the database area has not always been successful because of the black box approach that frustrates the hell out of folk trying to manage performance of said database - the DBA's. You don't see a SAN in the cloud thankfully, just commodity kit and DAS - at last!

Software has evolved to form reliable and performance rich distributed data processing aka Hadoop, Cassandra, Volt-DB to mention just 3.

Hardware has evolved - it's great to see even PCI based SSD being threatened with the true end goal of persistent memory - we are now able to put TB's of SSD directly into memory sockets.

It's going to be an interesting few years to come; and thankfully and hopefully that will be without SAN's!

1
7

Banged-up Brit hacker hacks into his OWN PRISON'S 'MAINFRAME'

Tony Rogerson
WTF?

You just couldn't make it up!

Its not funny but it is; or is it just bemusement?

4
0

Hey, PCIe flash makers. Look behind you - it's Samsung

Tony Rogerson
Thumb Up

Hopefully they'll be in the enterprise space.

I know they've comodity SSD SATA 3 drives (just ordered one this morning); but we need them in the enterprise space to drive those ridiculously high SSD drive prices down. Hopefully they'll pick up where OCZ hasn't really succeeded - the PCIe flash enterprise space.

1
0

Canadian man: I solved WWII WAR HERO pigeon code!

Tony Rogerson
Thumb Up

Resources

Thankfully its good to see that GCHQ have the common sense to put resources behind what is going on now and that should always be the case!

1
10

Ready for ANOTHER patent war? Apple 'invents' wireless charging

Tony Rogerson
Thumb Down

Appl$

And people thought Microsoft were bad!

They need to pull their finger out and innovate rather than litigate.

20
2

Transport Dept dishes out £1.9m rail database deal to Capita

Tony Rogerson
FAIL

Another rip off

1.9m - what on earth are they designing - another ebay?

Sickens me, there aren't even that many TCO's and what complexity is there!

A small software house could knock that up at a fraction of the cost.

4
0

WD bigshots spin superfast disk roadmap

Tony Rogerson

Its not space is IOPS (Databases anyway!)

In the database space IOPs is key; to get IOPs I physically need many many more hard drives than I need SSD for decent semi-random IO at a decent latency (< 5ms).

So, for my 1TB database I only need to buy a 1TB PCIe card, to get the same level of IOPS from hard disks I'd need dozens.

So, comparing storage shipped between SSD and Hard drives is just wrong, it should be a mix.

T

0
0

Nimbus boots EVA out of Mitsubishi

Tony Rogerson

Re: What was the point of this article??

Basically they replaced many many many disks required for IOps with significantly fewer!

I'm a SQL Server guy, it has a significantly different access profile that a file server.

Databases do not work well on file server ideology!

T

0
0

Secret's out: Small 15K disk drive market is 'growing'

Tony Rogerson

Re: Not missing the point

Not sure you are agreeing with me, but my point is this - why go to the trouble of a SAN when you can easily and more cheaply achieve the goal with PCIe connected flash?

You don't need to worry about switches and multiple HBA, controllers to get your throughput up.

You don't need to worry about latency nor IOP's

Redundancy is easy.

Remember as I said, I write within the context of the Database space.We need raw throughput because in BI we may be processing over hundreds of GiB's of data in a single query - that data is spread across the storage, something where disk geometry starts to play an effect on latency per IO.

I don't believe the answer is SSD (SAS or FC connected) but its Flash based PCIe; however, I don't believe that will easily become a reality until commoditised distributed database platforms take hold - something that we are already seeing with HADOOP to help deal with "Big Data".

T

0
0
Tony Rogerson

SSD is cheaper per GB when...

As a database professional I need to quantify two things - how much data I need to store and how many IOps the IO subsystem needs to be capable of at a realistic latency of say < 5ms per IO.

As a raw comparison, say I want a 900GIB database, I want 10K IOps < 5ms IO; I can get a single PCIe card to do that with flash (easily and for less than £5K), if I want redundancy I buy another (£10K in total).

What about hard drives - 2.5" 15K, the max is 300GiB per drive, realistically around 300IOps per drive - how many drives do I need to buy in order to get the IOps and redundancy I need? Significant, so many in fact that I'd need an external array for a start - more cost, more controllers etc.

In the real world this rubbish about SSD's are more expensive per GB is just wrong, we require IOps in the real world the two go together, remember I need 900GiB of storage space, but I'd probably need 30 drives for 10K IOps with RAID 1+0 at latency of < 5ms per IO - that is about £9K just for the disks themselves without the two additional storage arrays to hold them, the dual controllers required and then the ongoing power and cooling....

T

0
0
Tony Rogerson

Missing the point a lot of the time

A lot of the time people just completely miss the point about PCIe connected flash - a single card can easily operate at 1.9GiBytes/second (ref: ocz revo3 maxiops that I've got in this machine I'm writing on), per channel SAS 600 can only cope with 500MiB/sec but the "interface" is dramatically less so if you want to achieve 1.9GiBytes per second from a SAN - how do you go about it? With complexity, cost and the hope that the latency will be realistic.

[written with the context of database storage]

Anyway I wonder if this WD guy was talking about growth in the past 6 months which I'd expect because their factories were flooded so it only stands to reason that now they are manufacturing again there is a short fall to fill :)

T

0
0

Intel penetrates PCIe flash biz with long-lasting hardness

Tony Rogerson
Thumb Down

Storage - always behind?

Just like their 520, way behind the competition.

Reminds me of the processor war in the commodity space when AMD slam dunked Intel on performance until Intel caught up with their i7 stuff, now the opposite is true.

T

0
0

Violin Memory flashes $50m wad from SAP

Tony Rogerson

Re: Wtf?

You are partly right; but in terms of infrastrucure the principle is that the servers in the cluster are fully redundant, so shared nothing, hence dasd. Connecting the machines in the cluster to a SAN which is basically what this article is saying goes completely against scale out.

Commodity storage in the cluster nodes can be achieved with commodity SSD, SATA or PCIe based to get the local IOps and still be cost effectively disposable.

There is definitely an upsell into this space but I'd expect that because lets face we have a lot of folk with vestied interests in keeping SAN technology.

0
0

London fire brigade outsources 999 control centre to Capita

Tony Rogerson
Childcatcher

Offshore'd?

I think a law should be put in place so that emergency service "call centres" should remain on the mainland and staffed by english speaking individuals you can actually understand, thereby not following the model by a lot of banks, ISP's, utility companies etc.

My two cents.

17
0

PCIe flash riddle of next-gen HP ProLiant record smasher

Tony Rogerson
Thumb Down

SQL 2005?

Interesting they are using what effectively is now an old release, we've since had 2008, 2008 R2 and now 2012.

http://c970058.r58.cf2.rackcdn.com/individual_results/HP/HP_ProLiant_DL385G7_TPCC_111114_01_es.pdf

I'd be interested to know the reasoning.

T

0
0

Flash DOOMED to drive itself off a cliff - boffins

Tony Rogerson
WTF?

All technologies hit a wall - look at 15Krpm disks

15Krpm disks have been around for over a decade and we are still only on capacities of 300GB on a 2.5" drive and 900GB on a 3.5" drive; you need at least a dozen 15Krpm drives to compete on a 50/50 random read/write work load with a typical SSD; even then the disks just can't get the data off quick enough.

Phase Change Memory is my bet.

T

1
0

OCZ out-flashes EMC's Lightning

Tony Rogerson
Thumb Up

Good - another move away from SAN

Certainly within the database space IOPS at low latency are more and more important, and for typical SAN based disk archiecture it costs more and more because of the inherently poor latency performance of a random workload on a 15Krpm drive.

Flash hooked into PCIe is the way to go, resilience through software block replication to remote servers.

Role on a nother 5 years!

T

0
0

Laser boffins blast bits onto hard drive at 200Gb/sec

Tony Rogerson
Meh

@ what latency though?

It's great writing at those speeds and I see lots of applications for that, in my realm - transaction log of a database for instance; web logs etc..

However, what about a semi or full random workload, will there be latency like we have already on spinning media, has anybody read the paper yet http://www.nature.com/ncomms/journal/v3/n2/full/ncomms1666.html?

It will be great if they can make the technology so we don't have spinning platters and head movements, basically make like flash :)

T

0
0

Shrunken Intel process boosts SSD performance

Tony Rogerson
Thumb Down

Another no clue SE?

Ok, you got me - 19 years of MS SQL Server experience and 13 MVP awards aside, I've tried googling the spec of the INTEL SSD you've specified and it doesn't appear to exist even on INTEL site itself.

A quick check on the P410i specs and you'll see that SAS connected drives operate up to 6Gbits/sec (per channel or drive in english), however, SATA connected drives, which is probably what you are testing with (please show me the spec otherwise) are connected at 3Gbit/sec which is significantly short (max 264MiB/sec) of what a standard SSD is capable of - the majority are now SATA 3 (6 Gbits/sec) hence the capable 530MiBytes/sec throughput (per drive).

I hope you enjoyed my paper!

Anyway, if you would like to show us all just how almighty and experienced you are then why not blog the perfmon results and a proper spec including the correct model numbers of the drives you are using!

Tony.

0
0
Tony Rogerson
WTF?

SQL Server? Your times don't sound right

@Philip Lewis: I've done a ton of testing on the OCZ Agility and IBIS drives and your numbers just don't stack up. Some figures can be found on my paper at http://www.reportingbrick.com.

I'd be interesting in you posting the hardware specifications, how you are attaching the SAS disks and how the SSD - note, the SSD's will likely be SATA connected and on say a HP controller will be limited to 1.5Gbits/second becaus of the SATA revision they use so the overall transfer speed will be significantly less that the SSD can achieve.

Back to the article, those figures are way behind what OCZ have had out for a couple of years now.

Tony.

0
0

SSDs choked by crummy disk interfaces

Tony Rogerson
Pint

HP's P410i controller takes SATA and SAS together

I did an interesting experiment on a DL360, the disk slots are SATA (regardless of protocol), anyway I've a 4 disk 10K 300GB each in RAID 0 of SAS; the spare slot I put an OCZ agility 3 60GB drive in a caddy and slotted it in the server - create it as a logical disk etc.. Ok, its only SATA 1 so 1.5Gbits/second but I still put the RAID array to shame on physical IOps.

What I'm saying is that we have the facility of using and getting the perf from SSD's now using existing protocols, random IOPS with a sub-millisecond latency is what I'm after as a database specialist.

Is it a conspiracy that the controller is limited to 1.5Gbits for SATA drives? If it was SATA 2 I'd have 300MBytes per second per drive with IOPS sub-millisecond, SATA 3 600MBytes per second.... all on one drive! Just negates the need for so many 15K disks so the vendors start losing money; ever wondered why the price of enterprise SSD drives to go in your kit are like 5x + the price of commodity ones that actually out perform!

T

0
0
Tony Rogerson
Unhappy

SATA already has hot plug written into the protcol

It's a great article by the way - very informative.

It bemuses me why we aren't moving away from SAS and to SATA 3; a lot of issues with SAS have already been addressed.

T

0
0

Freak lightning strike sends app, storage servers back in time

Tony Rogerson

I though SAN's were a dieing tech?

Certainly within the database space (Microsoft SQL Server) I'm seeing more and more businesses using PCI based Nand cards like FusionIO and OCZ VeloDrive; cost per IO is substantially smaller than the SAN delivered equivalent (IO's per second at a latency realistic for database use i.e. <= 3ms per IO).

Sounds like the storage vendors are fighting back and simply turning their smart "storage" arrays into "a server with dasd" :)

T

0
0

SMART unveils smarter, faster, fatter SSDs

Tony Rogerson
Happy

Performance gone - knacke'd in less than a fortnight

Had a sudden drop from 1ms per io to 11ms per io (60mb/sec to 5.5mb/sec), another 24 hours later and its 15ms per io (4mb/sec).

Going to keep it running, it lasted longer than I thought, but it proves that in a database setting with lots of write activity then wear rate is an issue, anyway, I'll blog next week sometime once I've all the data collated.

Tony.

0
0
Tony Rogerson

@Ammaross: stats

Using RAID 1 isn't going to double the life, RAID 1 is mirroring so both drives will wear (sorry ware :)) out equally, perhaps you were thinking RAID 0 but with that there is no redundancy.

I've a test running now I've stopped the drive timeouts I was getting, I'll leave it for a few weeks and see what the results are. Am capturing all the logical disk through perfmon so should see any degradation.

Also - it was two entirely different systems I was looking at, neither enterprise level writes which was my point!

0
0
Tony Rogerson
Angel

Burn test

I've kicked off a test using IOMeter for a 54GB file, 8KB 100% sequential write with 8 outstanding IO's on a OCZ Agility 3 60GB, its on Windows 2008 R2 and I'm logging each minute the logical disk so I can see any degradtation over time. I'll leave it running and come back next weekend and let you know the results.

Yer yer, my spelling is crap, but I more than make up technically for that in SQL Server ;)

Does anybody have any figures in this area, I'm actually doing a project on using commodity SSD's for my masters BI project.

T

1
0
Tony Rogerson
Happy

Write ware

It would be good if that is the case.

Just checked one of my client's servers, a call centre, just a medium sized business and one of there databases has done 4TB of writes since middle of April (5 months). Another 14.5TB's since middle of May (3 months), those aren't enterprise installations either. Another 17TB's since end of June....

SO, my point is that in a database environment ware is a consideration, the biggest worry I have is there is absolutely no tests publicised in this area to see how long the live.

T

0
0

OCZ jumps harder on enterprise PCIe bandwagon

Tony Rogerson
Pint

raid0 on card is fine, also, ibis are fine

whats wrong with the 4 or 8 way raid0? If you doing this properly you'd buy two cards and use software mirroring.

I've also done a lot of benchmarking on 2 x 240gb ibis cards in that configuration and they work brilliantly, on my simple database workload I can very easily overload the cores so they can't keep up (see my blog on it and the other tests I've done so far on them: http://tinyurl.com/3d8ygeu).

I think the majority of ibis problems come because of motherboard imcompatibility problems caused by the lack of bios memory available.

T

0
0