0

Happy New Year 2015

I know it’s late but I’ve been away, I hope you all had a good Christmas and New Year.

I try and regularly review my skills as a DBA but more so at the start of a new year. You know the type of questions:-

  • Have I improved my overall skillset over the last year?
  • What areas have I worked specifically on?
  • How do my skills in those areas compare to last year?
  • Are there any areas in which I need to improve upon?
  • What can I do over the next week/month/year in order to keep improving?

2014 was a pretty exciting year for me. I moved to Dublin to work and looking back, it was the completely correct decision. I’ve been involved in a major project over the last few months and all in all (with few exceptions) everything has gone well. Highlights for me include setting up a brand new production environment, migrating our old databases (goodby SQL 2005!) and then learning how to configure our brand new SAN. SAN administration is completely new to me and it’s been fun. Challenging but good fun.

As DBAs we have to constantly keep learning in order to not be left behind. 2014 was no exception, a new version of SQL came out with a bunch of brand new features (think buffer pool extensions and In-memory OLTP). I don’t know about you but I’ve been reading about and experimenting with SQL 2014 like crazy.

So what am I looking forward to doing in the new year? Well, here’s the top three things I want to get done this year:-

  • Implement Always-On availability groups for diaster recovery
  • Further work with In-Memory OLTP (get a lab up and running and do some benchmarking)
  • Completely review replication (I haven’t used it in a production environment, so I have to keep going back over it)

There’s no way that that list will stay as is but it’s good to have something to start with. I bet you’ve all got similar lists, either written down or just in your heads. So here’s to 2015! Let’s hope it’s as exciting and challenging as 2014!

2

Book Review – SQL Server Internals: In-Memory OLTP by Kalen Delaney

The new SQL Server 2014 feature In-Memory OLTP (code-named “Hekaton”) has been attracting a lot of interest since its release, promising to deliver (if you believe the rhetoric) an increase of up to 100 times in performance.

If you’re like me, you’ve got a copy of SQL Server 2014 Developer Edition and have played around, maybe created some memory-optimised objects but not really done much more.

But what is the real power of this feature? Why is it causing such a stir among the SQL Server community? I wanted to learn more, see what exactly In-Memory OLTP is and what it can do.

So when I saw that Red Gate were sending out copies of Kalen Delaney’s book SQL Server Internals: In-Memory OLTP for review, I jumped at the chance. I was hoping for a complete overview of the feature, specifically:-

What is In-Memory OLTP?
How does it provide the promised performance boost?
How does it differ from the existing SQL Server architecture?
How can it be applied in the “real world”?

Does the book deliver? I’ll run through each chapter, detailing what the content is and then summarise if it met my expectations.

(N.B. – The book comes with code examples, all of which can be downloaded from here. I would highly recommend that anyone reading through this book also get the code and follow along in their own instance of SQL)

The introduction to the book explains how the existing SQL Server Engine is optimised for disk based storage and why the need for In-Memory OLTP arose (aka the limitations of IO subsystems). It details the goals that the team at Microsoft were set and that In-Memory OLTP meets all of them. It then sets expectations for the book, explaining what each chapter covers and what it relates to.

I liked the way that the introduction talks about specific performance bottlenecks within SQL Server, explaining what they are, how In-Memory OLTP resolves them and in which chapter that particular feature is covered.

DBCC PINTABLE was before my time as a DBA but I have heard it being mentioned in the same breath as In-memory OLTP. The first thing this chapter 1 does is point out that the two are in no way related. In-memory OTLP is an entirely feature within the SQL Server engine, with its behaviour very different that than of the existing disk based system.

The chapter then covers objects within the new features detailing tables, indexes, stored procedures, how the promised no locking/latching access works and how In-Memory objects deal with the recovery and durability required for objects within SQL Server. Each time detailing the differences against the existing objects within SQL Server.

Each of the areas of In-Memory OLTP is broadly covered in this chapter, giving a good overview of what In-Memory OLTP is, what it can do and how it does it. If you are approaching In-Memory OLTP with no prior knowledge, this will give you a good grounding.

Chapter 2 covers creating databases that will contain memory-optimised tables, detailing the new syntax required and what limitations/restrictions there are.

Of particular interest to me was that In-Memory OLTP tables will only support indexes on character columns with a binary collation (currently). The chapter mentions setting this option at either the column or database level, the book recommends setting this at the column level.
(N.B. – This is something that I really liked about the book. All the way through it provides recommendations on the usage of In-Memory OLTP, giving sound reasoning for them).

We then get to use the first of the provided scripts, creating a database that is configured to contain memory-optimised tables and then creating the tables themselves.

The final part of the chapter talks about how to access the in-memory objects, either by standard t-sql or the new feature, natively compiled stored procedures. We are told that Chapter 7 gives a more in-depth look at natively compiled procedures but for now we are just given a brief listing some of the restrictions when querying memory optimised tables, either via t-sql or the new natively compiled stored procedures.

Chapter 3 explains how memory optimised objects are structured and how they are accessed. We are introduced to the optimistic multi-version concurrency control (MVCC) model, which uses a form of row versioning for accessing/modifying rows in memory optimised tables. This can be a little hard to follow (well it was for me at first) but the book walks us through INSERT, UPDATE and DELETE operations, step by step, detailing the work that SQL performs in the background.

This chapter shows just how differently SQL treats memory optimised tables compared to disk based tables. The book goes through these differences, explaining how SQL avoids read-write/write-write conflicts without acquiring any locks, using simple examples for clarification.

Probably the most technical chapter, chapter 4 explains the new hash and range indexes that are available for memory optimised tables.

We are taken through the new internal structures of both types of indexes, the buckets in the hash indexes and the “Bw-tree” in the range indexes. Then the book goes through how SQL performs internal index operations: consolidation, splitting and merging of pages.

The most useful section of this chapter is where it details when each type of index should be used. Hash indexes being useful for unique data being queried with equality predicates whereas range indexes used for searching for a range of values. I am intrigued with how hash indexes will perform; the book mentions that the correct number of “buckets” must be set for optimum performance but I am confused as to how this figure is to be calculated (I’m guessing trial and error).

Chapter 5 goes through how SQL Server preserves the ACID properties of every transaction against memory optimised tables. It goes through the internal processes that SQL performs for every transaction, detailing the differences in the difference isolation levels.

Of particular interest is the compatibility of the existing isolation levels with memory optimised objects. For instance READ UNCOMMITTED is not supported. What is of real benefit is the advice on when the different isolation levels that are available can be used for cross container queries (queries referencing tables on disk and in memory).

I suspect that when I come to use memory optimised tables “in anger”, I will stick to the default isolation level but having this extra information for reference is definitely useful.

There are more code examples here, allowing us to run queries in different isolation levels and observe how SQL reacts (i.e. – the errors that are generated!).

Chapter 6 explains how SQL Server supports recovery of data in memory optimised tables. It explains how SQL Server writes to the transaction log and details the concept of “reduced logging”. We are given a code example to highlight this. I really like delving into the internals of SQL Server so I thought this was great.

We are introduced to “Checkpoint File Pairs”; the book provides in-depth detail on how they are structured and how they are managed. Code is again provided so that we can observe the processes themselves, querying the relevant system objects to see what SQL is doing in the background. This is something that I’m going to explore further on my own, but it’s really helpful to have the code from the book to give me a start.

Finally the chapter details what SQL does during the recovery process, going through the checkpoint file load process and then replaying the tail of the transaction log.

Another completely new feature is the native compilation of stored procedures that can be used to access memory optimised tables. Chapter 7 explains that stored procedures can be converted to processor instructions that the server’s CPU(s) will execute directly without further compilation. This is where the performance benefit comes from; the CPU having to execute fewer instructions reduces processing time.

It also reveals that memory optimised tables are also natively compiled, so both tables and stored procedures are stored as DLLs, loaded into memory and linked to SQL Server.

The chapter then gives code examples on how to create the objects, highlighting the differences in syntax and then detailing the restrictions. The final part of the chapter, the part that I found most interesting, was focussed on performance comparisons. First we are shown details of the performance comparisons that Microsoft carried out. Then we are shown example code, allowing us to perform our own testing. I really like this, viewing performance data is one thing but being able to run scripts and see the results really drives the point home.

To complete the book, chapter 8 details how In-Memory OLTP is supported and can be managed.
We are taken through feature support, managing the memory allocations required, new reports, additions to catalog views and new dynamic management objects, best practices, current uses and finally the migration process.

The listing of the additions to the catalog views and the new dynamic management objects will be extremely useful. There is a brief summary of each but I would recommend copying out each one, finding the MSDN article and reading that in full. It’s good to know what information is available via these system objects, and the information here will allow you to build your own scripts to analyse the In-Memory OLTP architecture.

Another section that I found helpful was the section detailing current uses of In-Memory OLTP. I came away from the section already thinking about how I could apply the feature to the existing systems I manage in my day job. A little premature maybe, but for me it took In-Memory OLTP from theoretical to something that I could actually implement.

So did the book meet my expectations? Well, I started out with the questions:-

What is In-Memory OLTP?
How does it provide the promised performance boost?
How does it differ from the existing SQL Server architecture?
How can it be applied in the “real world”?

Chapter 1 gives a good overview of what In-Memory OLTP is, giving a good overview of each of the features which sets the reader up nicely for the rest of the book. How In-Memory OLTP delivers its promised performance boost and how it differs from the existing disk based architecture is mentioned throughout the book. As each feature is discussed we are told how it works and how it is different from the existing SQL Server architecture. Finally, the section in chapter 8 which talks about existing applications of In-Memory OLTP answered my final question. This was a nice touch and as I said earlier, got me thinking about how I could implement In-Memory OLTP in the system I manage.

So I have to say, yes, the book definitely met my expectations. By using an informal style (the diagrams are almost cartoonish) the book presents what could have been a very dry subject clearly, relating back to the existing architecture within SQL Server to highlight the differences and advantages of the new In-Memory OLTP features. It provides recommendations on how to implement In-Memory OLTP, providing examples of when it would and would not be useful.

The code used in each of the examples is available for download, allowing the reader to follow along and try out each of the new features. Not only does this code help explain the book subject matter but it will allow analysis of In-Memory OLTP objects when deployed in a Production environment (or at least provide a foundation for the reader to build their own custom scripts).

I would recommend this book to anyone looking to learn/implement SQL Server’s In-Memory OLTP feature.

The book can be purchased from Amazon. There will be a PDF version available in 2015 from http://www.red-gate.com/community/books/.

1

Video – Identifying large queries using Extended Events

A couple of weeks ago I was contacted by Webucator (an online training company) asking if they could use my blog post “Identifying Large Queries using Extended Events” for a video training session in their free series called SQL Server Solutions from the Web.

The main reason I write this blog is to give back to the SQL Server community, so I was more than happy for Webucator to make this video. I hope it’s of benefit to someone out there.

The video’s great. The instructor, Bruce Gordon, goes through my post explaining what each piece of code does and then shows how to use the GUI (introduced in SQL Server 2012) to re-configure an existing Extended Events session.

Here’s the video:-

Webucator provide many more SQL Server training videos here

4

Blogging for a year!

I’m actually quite proud of the fact that I’ve now been doing this for a year now (and that I’m still going).

So one year on, what have I learnt?

Syndicate
Get your blog syndicated with other, larger websites. The amount of views my blog got went through the roof after I syndicated with SQLServerCentral.com and ToadWorld.com. These websites will allow your blog to have a greater audience, otherwise how will people know your blog exists? Mine certainly isn’t showing up on the first page of Google results (yet).

Code
Everyone’s got tonnes of SQL scripts, however I bet you didn’t write them all. When you’re using code in your posts you must make sure you’ve written it, otherwise you MUST reference the person who did. I’ve written a couple of blogs were I went to use a script from my library but had no idea if I’d written it or not. In that situation it’s better to use a script from the net and reference the author.

Don’t over commit
Set a realistic goal for the amount of posts you will write in a given time period. I started out trying to write a minimum of one blog per month. I think this is a good target for myself, I’m not an accomplished writer so I didn’t want to set an unrealistic goal. I’ve managed to (mostly) keep to this target, only missing last December as I was moving to Dublin. It’s also allowed me time to think about topics that I want to write about and then spend a good amount of time writing and then editing.

Say, yes!
I’ve been contacted a couple of times over the last year by people wanting to either reference my blog or write a post for them. I’ve said yes each time and why not? I started this blog to give back to the SQL community and am really happy that people are reading it, and I’m absolutely ecstatic that someone out there has found it good enough to reference in their own work. There is no downside to saying yes to these requests. Your blog will get more exposure and you’ll get better as a writer.

It’s been done before
If you’re writing a technical blog, you’ll be very aware of the fact that the topics you are writing about have been written about many (perhaps many) times before. You need to remember that the point of your blog is to bring your perspective to the topic you’re writing about. What challenges in a particular topic have you faced? How did you overcome them?

Enjoy it
This is a hobby of mine, it’s not a job. I enjoy the fact that I can write about SQL and that my posts can help people out there. I never want to get to a stage where I’m treating this as a job, it’ll always be a hobby for me.

Thanks for reading!

0

Cardinality Estimation in SQL Server

Each time you view a seek/scan operator in an execution plan, you may have noticed that there’s a value for the estimated number of rows and a value for the actual number of rows. Sometimes these values can be fairly similar and sometimes they can be very different.

The estimated number of rows (or cardinality estimate) is very important when SQL is generating a plan to use for your query. It can mean the difference of your query running in milliseconds or minutes.

Quoting from Books Online:-

The query optimizer in SQL Server is cost-based. This means that it selects query plans that have the lowest estimated processing cost to execute. The query optimizer determines the cost of executing a query plan based on two main factors:

• The total number of rows processed at each level of a query plan, referred to as the cardinality of the plan.
• The cost model of the algorithm dictated by the operators used in the query.

The first factor, cardinality, is used as an input parameter of the second factor, the cost model. Therefore, improved cardinality leads to better estimated costs and, in turn, faster execution plans.

So, having an accurate value for the cardinality will allow SQL to generate a more efficient query plan which in turn will improve the performance of the query when executed. But how does the optimizer calculate the cardinality? Knowing how this value is calculated will allow you to write/tune queries more effectively.

I’ve got a few examples that will show the different values calculated in different scenarios. First, create a demo database:-

USE [master];
GO

IF EXISTS(SELECT 1 FROM sys.databases WHERE name = 'EstimatedRowsDemo')
    DROP DATABASE [EstimatedRowsDemo];
GO

IF NOT EXISTS(SELECT 1 FROM sys.databases WHERE name = 'EstimatedRowsDemo')
    CREATE DATABASE [EstimatedRowsDemo];
GO

Then create a table and insert some data:-

USE [EstimatedRowsDemo];
GO

IF EXISTS(SELECT 1 FROM sys.objects WHERE name = 'DemoTable')
DROP TABLE dbo.DemoTable;

IF NOT EXISTS(SELECT 1 FROM sys.objects WHERE name = 'DemoTable')
    CREATE TABLE dbo.DemoTable
	   (ColA INT,
	    ColB SYSNAME);
GO

INSERT INTO dbo.DemoTable
(ColA, ColB)
VALUES
(100 , 'VALUE1')
GO 200

INSERT INTO dbo.DemoTable
(ColA, ColB)
VALUES
(200 , 'VALUE2')
GO 500

INSERT INTO dbo.DemoTable
(ColA, ColB)
VALUES
(300 , 'VALUE2')
GO 400

Then create an index on the table:-

CREATE NONCLUSTERED INDEX [IX_DemoTable_ColA] on dbo.[DemoTable](ColA);
GO

We can now look at the statistics on that index:-

DBCC SHOW_STATISTICS ('dbo.DemoTable','IX_DemoTable_ColA');
GO

Statistics

I’m not going to go through what each of the values means, the following article on Books Online explains each value:-
http://msdn.microsoft.com/en-us/library/hh510179.aspx

Now we can start running queries against the table and view the number of estimated rows value that the optimiser generates.

Quick note – All the queries are being run in SQL Server 2014 and will give the same execution plan:-
Query Plan
Remember, it’s the estimated number of rows generated for each one that we are interested in. Also, I’m running the following before each SELECT statement:-

DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;

Don’t forget to click the ‘Include Actual Execution Plan’ button in SSMS (or hit Ctrl + M) as well.

OK, let’s go.

Query A

SELECT ColA FROM dbo.[DemoTable] WHERE ColA = 200;
GO

The value 200 corresponds to a RANGE_HI_KEY value in the stats for the index so the value for the EQ_ROWS is used, giving the estimated number of rows as 500.

Query A

Query B

SELECT ColA FROM dbo.[DemoTable] WHERE ColA = 201;
GO

The value 201 is not one of the RANGE_HI_KEY values in the stats so the value for the AVG_RANGE_ROWS in the histogram step where the value would fall is used. This gives an estimated number of rows as 1, a significant difference from Query A.

Query B

Query C

DECLARE @ParamA INT = 201;
SELECT ColA FROM dbo.[DemoTable] WHERE ColA = @ParamA;
GO

As a parameter is used in this query, the optimiser does not know the value set so the estimated number of rows is calculated as Density * Total Number of Rows (0.3333333 * 1100). This shows that using parameters in queries can drastically affect the estimated number of rows generated.

Query C

Query D

DECLARE @ParamA INT = 200;
SELECT ColA FROM dbo.[DemoTable] WHERE ColA > @ParamA;
GO

This query uses a parameter value but with an inequality operator (>). In this situation the estimated number of rows is 30% of the total number of rows in the table.

Query D

Query E

DECLARE @ParamA INT = 200;
SELECT ColA FROM dbo.[DemoTable] WHERE ColA > @ParamA OPTION(RECOMPILE);
GO

This is the same query as Query D but has OPTION(RECOMPILE) specified. This forces the plan to be compiled with the current value of the parameter, meaning that as the value is a RANGE_HI_KEY, the EQ_ROWS value for that histogram step is used. This gives a value of 400 for the estimated number of rows.

Query E

Query F

DECLARE @ParamA INT = 200;
DECLARE @ParamB INT = 300;
SELECT ColA FROM dbo.[DemoTable] WHERE ColA > @ParamA AND ColA < @ParamB
OPTION (QUERYTRACEON 9481);
GO

This query has two inequality operators. I specified OPTION (QUERYTRACEON 9481) to force SQL 2014 to use the old cardinality estimator, which will estimate the number of rows as 9% of the total number of rows in the able when two inequality operators are used.

Query F - 1

When this query is run in SQL 2014 (using the new cardinality estimator) a different result for the estimated rows is generated. I’m unsure as to how this value is calculated, something that I will investigate in the future but for now it is just something to be aware of as the value is significantly different (180 vs 99).

Query F - 2

EDIT

    Kevan Riley (@KevRiley) contacted me about the new cardinality estimator for SQL 2014 and how it differs from the estimator in 2012.

    For multiple predicates the estimator in 2012 used the formula:-

    ((Estimated number of rows for first predicate) *
    (Estimated number of rows for second predicate)) /
    Total number of rows

    So for this query in 2012: (330 * 330)/1100 = 99

    The new estimator in 2014 uses a formula with a process known as exponential backoff:

    Selectivity of most selective predicate *
    Square root of (selectivity of second most selective predicate) *
    Total number of rows

    So for this query in 2014: (330/1100) * (SQRT(330/1100)) * 1100 = ~180

    Thank you very much Kevan

    Query G

    DECLARE @ParamA INT = 200;
    DECLARE @ParamB INT = 300;
    SELECT ColA FROM dbo.[DemoTable] WHERE ColA > @ParamA AND ColA < @ParamB OPTION(RECOMPILE);
    GO
    

    Finally, running the same query as Query F but using OPTION(RECOMPILE). This has the same effect as the result in Query E, the plan is compiled using the current values of the parameters. As there are two values, the optimiser takes a “slice” of the AVG_RANGE_ROWS values from the histogram steps that the parameter values fall across (1 in this simple example).

    Query G

    I hope these examples have shown you how the optimiser calculates the estimated number of rows in different scenarios. Please let me know if you have any feedback in the comments section.

    Further information

    This post is designed to give a brief introduction into cardinality estimation. There are literally hundreds on articles on the web about this subject, even more so now with the changes made in SQL 2014. Below are articles/posts I referenced to write this post.

    Gail Shaw has a good presentation on this subject (which I referenced to write this post), it can be found here:-
    http://www.sqlpass.org/24hours/2014/summitpreview/Sessions/SessionDetails.aspx?sid=7279

    Paul White on multiple predicates for cardinality estimates:-
    http://sqlperformance.com/2014/01/sql-plan/cardinality-estimation-for-multiple-predicates

    Fabiano Amorim on getting the statistics used in a cached query plan:-
    http://blogfabiano.com/2012/07/03/statistics-used-in-a-cached-query-plan/