2

The Art of Performance Tuning

Performance tuning often gets called an art as people feel that a certain knack or innate talent comes into play. And whilst I don’t disagree that a certain level of knowledge is involved, I completely disagree that only certain people can performance tune.

Given the correct approach, anyone should be able to learn to effectively performance tune.

But how should performance tuning be approached? I’d like to take a step back from delving into specifics and define the process of performance tuning, a set of guidelines that can be used in any circumstance.

I believe that performance tuning should follow the scientific method, which is defined as:-

“A set of principles and procedures for the systematic pursuit of knowledge involving the recognition and formulation of a problem, the collection of data through observation and experiment, and the formulation and testing of hypotheses”

In practice, this can be broken down into the following steps:-

1) Observation
2) Declaration of the end goal or issue
3) Forming a hypothesis
4) Testing the hypothesis
5) Analysis of results
6) Conclusion
7) Further research

This way of approaching performance tuning comes into its own particularly when investigation is required in order to respond to issues in a live environment. The trick is to follow the method and not to go with “gut” feelings or guesses. Stick to what you know.

For example, your production server has started responding slowly, with queries taking longer than expected to complete. This could be due to memory pressure, but you are not sure. It would be better to start off by declaring “Performance of the server is poor”, which you know to be 100% correct, rather than declaring “Performance of the server is poor due to memory pressure”.

So the steps to follow would be:-

1. Observation
-Queries on production running for longer than expected
2. Declaration
-Performance of the server is poor
3. Hypothesis
-The poor performance of the server is due to memory pressure
4. Testing
-Running an extended events session on the server to catch incoming queries
5. Analysis
-Several queries performing scans of large tables
6. Conclusion
-Creation of a covering non-clustered index to prevent the table scans
7. Further Research
-Are there any other queries being executed that are performing table scans?

These steps may seem obvious but using the scientific method will prevent you from being lead down blind alleys and potentially missing the actual problem. What if the analysis in the example above indicated that memory pressure was not the cause of the poor performance of the server? If you had originally declared “Performance of the server is poor due to memory pressure” you could end up wasting valuable time and effort looking for other indications to back that statement up.

The scientific method provides a structure for the investigation of any performance issues you encounter. It’s very easy to feel under pressure when investigating performance issues, which can lead to mistakes being made or obvious problems being missed. Following the process of problem declaration, forming a hypothesis and then testing gives you a rigid structure to stick to when you’re feeling under the cosh, something to rely on as being a tried and tested method of investigation.

4

Partitioning Basics – Part 3 – Switching Data

This post follows on from Partitioning Basics – Part 2

In this final part, I want to go through how partitions can be used to archive data from a table. Firstly a table to archive the data from the primary table needs to be created:-

CREATE TABLE dbo.[DemoPartitionedTable_Archive]
(DemoID			INT IDENTITY(1,1),
 SomeData		SYSNAME,
 CaptureDate	DATE,

 CONSTRAINT [PK_DemoPartitionedTable_Archive] PRIMARY KEY CLUSTERED 
	(DemoID ASC, CaptureDate ASC) 

) ON DEMO;

I haven’t created this table on the partition scheme but it could be done because, according to Microsoft:-

“When a table and its indexes are aligned, then SQL Server can move partitions in and out of partitioned tables more effectively, because all related data and indexes are divided with the same algorithm.”

Reference:- Partitioned Tables and Indexes in SQL Server 2005

So if the archive table was on the partition scheme it would be “Aligned”. This means that moving data in and out would be more efficient. The only reason I haven’t done this here is because this is a basic demo.

Before the switch is performed the data in the partitions needs to be checked:-

SELECT 
	t.name AS TableName, i.name AS IndexName, p.partition_number, 
	r.value AS BoundaryValue, p.rows
FROM 
	sys.tables AS t
INNER JOIN
	sys.indexes AS i ON t.object_id = i.object_id
INNER JOIN
	sys.partitions AS p ON i.object_id = p.object_id AND i.index_id = p.index_id 
INNER JOIN
	sys.partition_schemes AS s ON i.data_space_id = s.data_space_id
INNER JOIN
	sys.partition_functions AS f ON s.function_id = f.function_id
LEFT OUTER JOIN 
	sys.partition_range_values AS r ON f.function_id = r.function_id AND r.boundary_id = p.partition_number
WHERE 
	t.name = 'DemoPartitionedTable'
AND 
	i.type <= 1
ORDER BY p.partition_number;

Here’s the data in the partitions:-

Partitions4

In order to switch the data from partition 1 in the primary table to the archive table, the following script needs to be run:-

ALTER TABLE dbo.[DemoPartitionedTable]
SWITCH PARTITION 1
TO dbo.[DemoPartitionedTable_Archive];
GO

Which makes the partitions look like this:-

Partitions6

Partition 1 has 0 rows and could be merged. The following script will merge the partition:-

ALTER PARTITION FUNCTION DemoPartitionFunction()
    MERGE RANGE ('2014-05-29');
GO

So now the partitions are:-

Partitions Final

The partition has had its data switched out to the archive table and then was merged into the above partition. Using the scripts in these three post will allow you to effectively manage partitioning in SQL Server.

I hope this series of posts has been helpful as an introduction into partitioning. Please let me know if you have any comments or questions.

3

Partitioning Basics – Part 2 – Splitting/Merging Partitions

This post follows on from the previous post Partitioning Basics – Part 1

Let’s have a look at the partitions setup in part 1, the following script will show the partition information:-

SELECT 
	t.name AS TableName, i.name AS IndexName, p.partition_number, 
	r.value AS BoundaryValue, p.rows
FROM 
	sys.tables AS t
INNER JOIN
	sys.indexes AS i ON t.object_id = i.object_id
INNER JOIN
	sys.partitions AS p ON i.object_id = p.object_id AND i.index_id = p.index_id 
INNER JOIN
	sys.partition_schemes AS s ON i.data_space_id = s.data_space_id
INNER JOIN
	sys.partition_functions AS f ON s.function_id = f.function_id
LEFT OUTER JOIN 
	sys.partition_range_values AS r ON f.function_id = r.function_id AND r.boundary_id = p.partition_number
WHERE 
	t.name = 'DemoPartitionedTable'
AND 
	i.type <= 1
ORDER BY p.partition_number;

Partitions1

These partitions will hold data until the 11th of June, after which all records in the table will be put into the end partition (number 16). This means that the end partition will end up holding a lot of records. In order to prevent this, a new partition needs to be created.

To do this, firstly the partition scheme needs to be told which filegroup the next partition will use:-

ALTER PARTITION SCHEME DemoPartitionScheme
	NEXT USED [Demo];
GO

Then a SPLIT command is run against the partition function:-

ALTER PARTITION FUNCTION DemoPartitionFunction()
	SPLIT RANGE ('2014-06-12');
GO

And now when the data checking script is run, the results are:-

Partitions2

Partition 16 now has a boundary value of 2014-06-12 and a new partition (17) has been created.

But how can a partition be removed? There’s a really simple MERGE command that performs this operation:-

ALTER PARTITION FUNCTION DemoPartitionFunction()
	MERGE RANGE ('2014-05-28');
GO

And when the data checking script is run now, the results are:-

Partitions3

So the data in the partition with the boundary value 2014-05-28 has been merged into the partition with the boundary of 2014-05-29. The problem now is that the last partition has data from the partition that was just merged into it as well as the existing data that was already there. In order to prevent this from happening, the data from the partition that is about to be merged can be switched into another table before the MERGE command is executed.

I will cover this in the next part of this blog series.

6

Partitioning Basics – Part 1 – Creating Partitions

Partitioned tables can be a quick and efficient way to (amongst other things) archive data. In the next couple of posts I will run through the basics of partitioning. Starting with setting up a partitioned table and loading some data.

Let’s create a demo database:-

USE [master];
GO

IF EXISTS(SELECT 1 FROM sys.databases WHERE name = 'DEMO')
	BEGIN
		DROP DATABASE [Demo];
	END

CREATE DATABASE [Demo]
	ON PRIMARY 
(NAME     = 'Demo', 
 FILENAME = 'C:\SQLServer\Data\Demo.MDF'), 
	FILEGROUP [DEMO] DEFAULT 
(NAME     = 'Demo_Data', 
 FILENAME = 'C:\SQLServer\Data\Demo_Data.NDF')
	LOG ON 
(NAME    = 'Demo_Log', 
FILENAME = 'C:\SQLServer\Logs\Demo_log.ldf')
GO

OK, so the first thing to do is create a partition function. This function defines the number of partitions we will intiailly have. It also specifies the boundaries of each partition.

USE [Demo];
GO

IF NOT EXISTS(SELECT 1 FROM sys.partition_functions WHERE name = 'DemoPartitionFunction')
CREATE PARTITION FUNCTION DemoPartitionFunction (DATE)
AS RANGE RIGHT 
FOR VALUES (DATEADD(dd, DATEDIFF(dd, 0, GETUTCDATE()), 7),
			DATEADD(dd, DATEDIFF(dd, 0, GETUTCDATE()), 6),
			DATEADD(dd, DATEDIFF(dd, 0, GETUTCDATE()), 5),
			DATEADD(dd, DATEDIFF(dd, 0, GETUTCDATE()), 4),
			DATEADD(dd, DATEDIFF(dd, 0, GETUTCDATE()), 3),
			DATEADD(dd, DATEDIFF(dd, 0, GETUTCDATE()), 2),
			DATEADD(dd, DATEDIFF(dd, 0, GETUTCDATE()), 1),
			DATEADD(dd, DATEDIFF(dd, 0, GETUTCDATE()), 0),
			DATEADD(dd, DATEDIFF(dd, 0, GETUTCDATE()), -1),
			DATEADD(dd, DATEDIFF(dd, 0, GETUTCDATE()), -2),
			DATEADD(dd, DATEDIFF(dd, 0, GETUTCDATE()), -3),
			DATEADD(dd, DATEDIFF(dd, 0, GETUTCDATE()), -4),
			DATEADD(dd, DATEDIFF(dd, 0, GETUTCDATE()), -5),
			DATEADD(dd, DATEDIFF(dd, 0, GETUTCDATE()), -6),
			DATEADD(dd, DATEDIFF(dd, 0, GETUTCDATE()), -7));
GO

So this statement has said that my partition function is going to be using dates as boundaries. What I’ve done is setup a bunch of partitions for one week in the past and one week in the future. The function sets the right “side” of the partition as the boundary, basically specifying which side of the boundary value the partition extends. The best way of thinking about it is like this…..

Today’s date is 2014-06-04. So the boundary values created by running the above script are:-

1. 2014-05-28
2. 2014-05-29
3. 2014-05-30
4. 2014-05-31
5. 2014-06-01
6. 2014-06-02
7. 2014-06-03
8. 2014-06-04
9. 2014-06-05
10. 2014-06-06
11. 2014-06-07
12. 2014-06-08
13. 2014-06-09
14. 2014-06-10
15. 2014-06-11

Say x is a value in the table then the partitions would evaluate as:-

1. 2014-05-28 > x
2. 2014-05-28 <= x < 2014-05-29
3. 2014-05-29 <= x < 2014-05-30
4. 2014-05-30 <= x < 2014-05-31
5. 2014-05-31 <= x < 2014-06-01
6. 2014-06-01 <= x < 2014-06-02
7. 2014-06-02 <= x < 2014-06-03
8. 2014-06-03 <= x < 2014-06-04
9. 2014-06-04 <= x < 2014-06-05
10. 2014-06-05 <= x < 2014-06-06
11. 2014-06-06 <= x < 2014-06-07
12. 2014-06-07 <= x < 2014-06-08
13. 2014-06-08 <= x < 2014-06-09
14. 2014-06-09 <= x < 2014-06-10
15. 2014-06-10 <= x < 2014-06-11
16. 2014-06-11 <= x

So in these partitions, the data in the partition is always less than the value of that partition’s boundary.

After the partition function, a partition scheme needs to be created:-

IF NOT EXISTS(SELECT 1 FROM sys.partition_schemes WHERE name = 'DemoPartitionScheme')
CREATE PARTITION SCHEME DemoPartitionScheme
AS PARTITION DemoPartitionFunction
ALL TO (DEMO);

The partition scheme maps the data to a partition function and specifies where the data is going to be stored. The [ALL TO (DEMO)] part of the statement says that all partitions are going to be mapped to the DEMO filegroup.

Now that the function and scheme have been created we can create a table on the partitions:-

CREATE TABLE dbo.[DemoPartitionedTable]
(DemoID			INT IDENTITY(1,1),
 SomeData		SYSNAME,
 CaptureDate	DATE,

 CONSTRAINT [PK_DemoPartitionedTable] PRIMARY KEY CLUSTERED 
	(DemoID ASC, CaptureDate ASC)

) ON DemoPartitionScheme(CaptureDate);

Instead of specifying that the table be placed on a filegroup, the statement has [ON DemoPartitionScheme(CaptureDate)] at the end. This means that the table is on the partition scheme, using the column CaptureDate to determine which rows go on which partition.

Let’s insert some data into the table:-

INSERT INTO dbo.[DemoPartitionedTable]
(SomeData,CaptureDate)
VALUES
('Demo',DATEADD(dd, DATEDIFF(dd, 0, GETUTCDATE()), -8));
GO 457

INSERT INTO dbo.[DemoPartitionedTable]
(SomeData,CaptureDate)
VALUES
('Demo',DATEADD(dd, DATEDIFF(dd, 0, GETUTCDATE()), -7));
GO 493

INSERT INTO dbo.[DemoPartitionedTable]
(SomeData,CaptureDate)
VALUES
('Demo',DATEADD(dd, DATEDIFF(dd, 0, GETUTCDATE()), -6));
GO 486

INSERT INTO dbo.[DemoPartitionedTable]
(SomeData,CaptureDate)
VALUES
('Demo',DATEADD(dd, DATEDIFF(dd, 0, GETUTCDATE()), -5));
GO 413

INSERT INTO dbo.[DemoPartitionedTable]
(SomeData,CaptureDate)
VALUES
('Demo',DATEADD(dd, DATEDIFF(dd, 0, GETUTCDATE()), -4));
GO 473

INSERT INTO dbo.[DemoPartitionedTable]
(SomeData,CaptureDate)
VALUES
('Demo',DATEADD(dd, DATEDIFF(dd, 0, GETUTCDATE()), -3));
GO 461

INSERT INTO dbo.[DemoPartitionedTable]
(SomeData,CaptureDate)
VALUES
('Demo',DATEADD(dd, DATEDIFF(dd, 0, GETUTCDATE()), -2));
GO 422

INSERT INTO dbo.[DemoPartitionedTable]
(SomeData,CaptureDate)
VALUES
('Demo',DATEADD(dd, DATEDIFF(dd, 0, GETUTCDATE()), -1));
GO 461

INSERT INTO dbo.[DemoPartitionedTable]
(SomeData,CaptureDate)
VALUES
('Demo',DATEADD(dd, DATEDIFF(dd, 0, GETUTCDATE()),0));
GO 273

This will insert data into today’s partition and the partitions for the last 8 days. This can be checked by running the following script:-

SELECT 
	t.name AS TableName, i.name AS IndexName, p.partition_number, 
	r.value AS BoundaryValue, p.rows
FROM 
	sys.tables AS t
INNER JOIN
	sys.indexes AS i ON t.object_id = i.object_id
INNER JOIN
	sys.partitions AS p ON i.object_id = p.object_id AND i.index_id = p.index_id 
INNER JOIN
	sys.partition_schemes AS s ON i.data_space_id = s.data_space_id
INNER JOIN
	sys.partition_functions AS f ON s.function_id = f.function_id
LEFT OUTER JOIN 
	sys.partition_range_values AS r ON f.function_id = r.function_id AND r.boundary_id = p.partition_number
WHERE 
	t.name = 'DemoPartitionedTable'
AND 
	i.type <= 1
ORDER BY p.partition_number;

This will show each partition and the number of rows in it:-

Partitions1

Remember that the boundary values are the upper limit, no value in the partitions will match that value due to the partition function being specified as RANGE RIGHT.

I’ll continue this post later in the month, when I’ll talk about creating new partitions and merging partitions.