0

Partitioning and filegroup restores

I’ve been playing around with partitioning quite a lot recently and wanted to write a quick post about how it can help you out in a DR situation.

Partitioning is mainly for increasing the manageability of your data but it also has other benefits, one of them being giving you the ability to split a single table across multiple filegroups. This will allow you to keep your current data in one filegroup and, let’s call it historical data, in another. In a DR situation, if you need to bring your current data online quickly and worry about the rest later, this can really help you out.

So let’s run through a quick example.First, create a database:-

CREATE DATABASE [PartitioningDemo]
 ON PRIMARY 
(NAME = N'PartitionDemo', FILENAME = N'C:\SQLServer\SQLData\PartitionDemo.mdf', SIZE = 51200KB, MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB), 
 FILEGROUP [DATA] 
(NAME = N'DATA', FILENAME = N'C:\SQLServer\SQLData\DATA.ndf', SIZE = 51200KB, MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB ),
 FILEGROUP [ARCHIVE] 
(NAME = N'ARCHIVE', FILENAME = N'C:\SQLServer\SQLData\ARCHIVE.NDF', SIZE = 51200KB, MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB )
 LOG ON 
(NAME = N'PartitionDemo_log', FILENAME = N'C:\SQLServer\SQLLog\PartitionDemo_log.ldf', SIZE = 20480KB, MAXSIZE = 2048GB, FILEGROWTH = 512KB)
GO

This database has three filgroups. The PRIMARY (as always), DATA and an ARCHIVE filegroup. What this demo is going to show you is how to bring the PRIMARY and DATA filegroups online first and then bring the ARCHIVE filegroup online afterwards.

So now let’s create a partition scheme and function:-

USE [PartitioningDemo];
GO

CREATE PARTITION FUNCTION PF_PartitionedTable(DATE)
	   AS RANGE RIGHT 
    FOR VALUES ('2014-01-01','2015-01-01','2016-01-01');
GO


CREATE PARTITION SCHEME PS_PartitionedTable
    AS PARTITION PF_PartitionedTable
TO ([ARCHIVE],[ARCHIVE],[DATA],[DATA]);
GO

The table we’re going to build will be partitioned by year, two partitions on the ARCHIVE group and two on the DATA filegroup.

So let’s create the table (and its clustered index): –

CREATE TABLE dbo.PartitionedTable
(PKID INT IDENTITY(1,1),
 ColA VARCHAR(10),
 ColB VARCHAR(10),
 CreatedDate DATE)
 ON PS_PartitionedTable(CreatedDate);


CREATE UNIQUE CLUSTERED INDEX [IX_CreatedDate_PartitionedTable] ON dbo.PartitionedTable
 (CreatedDate,PKID) 
ON PS_PartitionedTable(CreatedDate);
GO

Now insert some data: –

SET NOCOUNT ON;

INSERT INTO dbo.PartitionedTable
(ColA,ColB,CreatedDate)
VALUES
(REPLICATE('A',10),REPLICATE('A',10),'2013-02-01');
GO 1000

INSERT INTO dbo.PartitionedTable
(ColA,ColB,CreatedDate)
VALUES
(REPLICATE('A',10),REPLICATE('A',10),'2014-02-01');
GO 1000

INSERT INTO dbo.PartitionedTable
(ColA,ColB,CreatedDate)
VALUES
(REPLICATE('A',10),REPLICATE('A',10),'2015-02-01');
GO 1000

INSERT INTO dbo.PartitionedTable
(ColA,ColB,CreatedDate)
VALUES
(REPLICATE('A',10),REPLICATE('A',10),'2016-02-01');
GO 1000

Let’s quickly check the data in the partitions:-

SELECT 
	t.name AS TableName, i.name AS IndexName, p.partition_number, p.partition_id, 
	--i.data_space_id, f.function_id, f.type_desc, 
	fg.name AS [filegroup], 
	r.boundary_id, r.value AS BoundaryValue, p.rows
	--,r.*
FROM 
	sys.tables AS t
INNER JOIN
	sys.indexes AS i ON t.object_id = i.object_id
INNER JOIN
	sys.partitions AS p ON i.object_id = p.object_id AND i.index_id = p.index_id 
INNER JOIN 
    sys.allocation_units a ON a.container_id = p.hobt_id 
INNER JOIN 
    sys.filegroups fg ON fg.data_space_id = a.data_space_id 
INNER JOIN
	sys.partition_schemes AS s ON i.data_space_id = s.data_space_id
INNER JOIN
	sys.partition_functions AS f ON s.function_id = f.function_id
LEFT OUTER JOIN 
	sys.partition_range_values AS r ON f.function_id = r.function_id 
									AND r.boundary_id = p.partition_number
WHERE 
	t.name = 'PartitionedTable'
AND 
	i.type <= 1
AND
    a.type = 1 --in row data only
ORDER BY p.partition_number DESC;

DataInPartitionedTable1

So both filegroups have 2000 rows in them. Now let’s perform a filegroup restore, bringing the PRIMARY & DATA filegroups online first.

Take a full and log backup of the database:-

USE [master];
GO

--FULL DATABASE BACKUP
BACKUP DATABASE [PartitioningDemo]
   TO DISK = 'C:\SQLServer\Backups\PartitioningDemoFullBackup.bak'
   WITH INIT
GO


--LOG BACKUP
BACKUP LOG [PartitioningDemo]
	TO DISK = N'C:\SQLServer\Backups\PartitioningDemoLogBackup.trn'
	WITH NO_TRUNCATE, INIT
GO

OK, now we’re simulating a problem, first take a tail log backup:-

BACKUP LOG [PartitioningDemo]
	TO DISK = N'C:\SQLServer\Backups\PartitioningDemoTailLogBackup.trn'
	WITH INIT, NORECOVERY
GO

And now we’re going to perform a filegroup restore of the PRIMARY and DATA filegroups:-

--PRIMARY filegroup
RESTORE DATABASE [PartitioningDemo] 
   FILEGROUP = 'PRIMARY'
   FROM DISK = 'C:\SQLServer\Backups\PartitioningDemoFullBackup.bak'
   WITH 
   REPLACE, PARTIAL, NORECOVERY;
GO


--DATA filegroup
RESTORE DATABASE [PartitioningDemo] 
   FILEGROUP = 'DATA'
   FROM DISK = 'C:\SQLServer\Backups\PartitioningDemoFullBackup.bak'
   WITH 
   REPLACE, PARTIAL, NORECOVERY;
GO


--Restore transaction log & tail log backups
RESTORE LOG [PartitioningDemo] FROM DISK = N'C:\SQLServer\Backups\PartitioningDemoLogBackup.trn' WITH NORECOVERY;
RESTORE LOG [PartitioningDemo] FROM DISK = N'C:\SQLServer\Backups\PartitioningDemoTailLogBackup.trn' WITH RECOVERY;
GO

Now we can query the table:-

SELECT COUNT(*) FROM [PartitioningDemo].dbo.[PartitionedTable]
WHERE CreatedDate > CONVERT(DATE,'2015-01-01')
GO


--Check access to archive data
SELECT COUNT(*) FROM [PartitioningDemo].dbo.[PartitionedTable]
WHERE CreatedDate < CONVERT(DATE,'2015-01-01')
GO

First query will run fine but the second will generate an error:-
QueryError1

So we still have to restore the ARCHIVE filegroup:-

--Restore ARCHIVE filegroup
RESTORE DATABASE [PartitioningDemo] 
   FILEGROUP = 'ARCHIVE'
   FROM DISK = 'C:\SQLServer\Backups\PartitioningDemoFullBackup.bak'
   WITH 
   NORECOVERY;
GO


--Restore transaction log & tail log backups
RESTORE LOG [PartitioningDemo] FROM DISK = N'C:\SQLServer\Backups\PartitioningDemoLogBackup.trn' WITH NORECOVERY;
RESTORE LOG [PartitioningDemo] FROM DISK = N'C:\SQLServer\Backups\PartitioningDemoTailLogBackup.trn' WITH NORECOVERY;
GO

Bring the database fully online:-

RESTORE DATABASE [PartitioningDemo] WITH RECOVERY;
GO

And re-run the queries against the table:-

SELECT COUNT(*) FROM [PartitioningDemo].dbo.[PartitionedTable]
WHERE CreatedDate > CONVERT(DATE,'2015-01-01')
GO


--Check access to archive data
SELECT COUNT(*) FROM [PartitioningDemo].dbo.[PartitionedTable]
WHERE CreatedDate < CONVERT(DATE,'2015-01-01')
GO

Now both queries will return results:-
QueryResults1

Neat huh? You can see that in a DR situation, if you have a correct partitioning and filegroup strategy in place, you can reduce the amount of time it will take to bring your current data online.

More about filegroup restores: –
https://msdn.microsoft.com/en-ie/library/aa337540.aspx

2

First foray into presenting

It’s been a while since I’ve posted as the run up to Xmas last year and this January have been particularly busy. February doesn’t seem to be going to be any quieter but at least I’m being kept on my toes.

Anyway, waaaay back in December the SQL Server Ireland User Group (website) ran a Xmas “extravaganza” in which they held a series of lightening talks. Presenting is something that I’ve wanted to get into for a while so with a bit of pushing from one of the guys on the panel, I signed myself up. The talk only needed to be 5 minutes long but this was ideal for me considering I’ve never presented before.

The talk I did was on how I implemented partitioning on a set of existing tables within my company’s OLTP databases. First time presenting? Stick with something you know. I also felt pretty safe from awkward questions (or people pointing out how I could have implemented the solution more effectively!) as I wanted the session to be very specific to the environment I was working with and the limitations/restrictions I had to deal with.

So how did I do? Well considering I was quite convinced that I was going to stammer, freeze up and then flee the room (honestly, I thought this WAS what was going to happen)…quite well really. I did have a minor technical glitch…I was duplicating my desktop onto the projected screen and it started reverting back to extend (as I have it in the office), so the screen on the projector went blank. Not a problem at first as this happened when I was setting up so I quickly went into settings and set it back to duplicate.

However, I did this too quickly and forgot to click the “Keep Changes” option so after 10 seconds it reverted and the screen went blank again. So I went back into the settings, fixed it, forgot to click “Keep Changes” and ended up in this loop until one of the other presenters helped me out (thankfully). In the end I managed to laugh it off, continue and ended up winning a bottle of champagne for “Best New Speaker” so I’m not going to obsess about that too much (I promise).

In the end I really enjoyed the whole experience…OK…looking back with rose tinted glasses I enjoyed it, at the time I was too nervous. I’m not going to turn this post into another “Best pieces of advice for first time presenters” (although I can’t promise I won’t do one of those posts in the future) but one thing I will say is know, really know your presentation. Go over it over and over, practice continuing when making mistakes and don’t stick to a rigid script. By this I mean, if you go off track slightly, you’ll be able to make it through without stumbling over or losing track.

So what now? Well I’ve been accepted to present another lightening talk at SQL Saturday Exeter. It’s only 10 minutes I know but it’s double what I’ve done previously and the SQL Saturday events are much bigger. I’ll post about how it went afterwards but for now I’m looking forward to it, until the panic sets in!

0

Pass 2015 – Overall Thoughts

I know I only posted one other blog about Pass but I’ve been way too busy over the last couple of weeks to write any more. So here’s my thoughts on what was my first (and hopefully not last) Pass Summit (would have posted this earlier but I’ve been enjoying a bit of jet lag).

Firstly, I had a whale of a time. I met a whole bunch of people, including guys from UK & Dublin chapters and the Midnight DBAs; went to some great sessions and picked up some free stuff from a load of vendors. Secondly, how cool were the evening events? The welcome reception was very slick and the party at EMP Museum (what a place) was brilliant. They could probably tone down the karaoke a tad mind, did it really need to start so early?

The only grumble I have is that I didn’t really enjoy the first timers “speed networking” session. For those who don’t know, you’re lined up in rows facing each other and are given three minutes each way to introduce yourself and talk about a certain subject (biggest challenge at work, programs that you use most frequently etc.). At the end of the six minutes you move down one seat to your left and the process kicks off again. For me, this went on a little bit too long…after about the third round I was starting to get a little weary. The room was very loud so you practically had to shout for three minutes. I left with a bit of parched throat but thankfully a beer was waiting for me at the welcome reception.

The keynotes. I enjoyed them although I have to admit, when I saw the length of them I was a bit put off. However both were delivered excellently. The first keynote by Joseph Sirosh & Shawn Bice going through the new features of SQL Server 2016 was very interesting. I’d seen most of what they talked about before but I hadn’t heard about native support for R within SQL Server – that’s going to be huge!

The second keynote by David DeWitt & Rimma Nehme talking about the progression of IoT was less my cup of tea (I’m more of a demo guy) but they still talked about a lot of interesting concepts. One of which was fog computing (google it) which led to the expression, “What the fog?”. Ha ha.

Now, the sessions. I attended more of the “out there” sessions if I’m honest. I’ve been to a few conferences at this point in my career so didn’t what to go and see the same old subjects (e.g. – basic stuff on the SQL database engine; indexes, partitioning, query tuning etc.), I wanted to see something new. Highlights included a session on Hadoop, a really cool workshop on u-sql and a session on technical writing.

The best session that I attended though was the Speaker Idol final at 4pm on the last day. 4 great speakers all competing for a grand prize of a full 75 minute session at Pass Summit 2016 hosted by the indefatigable Denny Cherry (b|t). All week everyone I talked to mentioned the sense of community at Pass and in my opinion, the Speaker Idol final was where it was most prevalent. Each of the speakers performed brilliantly and congratulations to David Maxwell who won (see his blog post on the Summit here.

Now to see how to get to the Summit 2016…

0

Pass 2015 – First Impressions

It’s Pass 2015 this week and I’m attending (seriously I’m unbelievably chuffed)! As they’ll be a shedload of blogs about it this week I thought I’d shamelessly jump on the bandwagon.

I flew out to Seattle from Dublin on Friday (god that was a long day) so I’ve had a couple of days knocking around town doing the usual touristy things. Went up the Space Needle and did the underground tour on Saturday, both of which were pretty cool. Fantastic views from the Needle and the underground tour was a really interesting delve into the history of the area. Then yesterday I went to Pike Street Market and then went wandering around. Ended up by the Seattle stadium, saw that there was a Sounders game on so ended up going to that. Cracking day out.

But I’m not here to do sight seeing really, I’m here for SQL Pass 2015 and it kicked off this morning with the first set of pre-con sessions. Have to say, first impression is how big is this place?? The Washington State Convention Centre is absolutely huge! It completely blows any other conference I’ve attended out of the water. I’ve signed up for the first timers reception and the welcome sessions but today was Allan Hirt’s (b|t) session on high availability.

I really enjoyed the session, Allan is a great presenter with a seemingly limitless knowledge of windows clustering. He talked us through many different configuration options and what we should be considering when setting up a windows failover cluster. My favourite parts of the session though was when Allan talked about what’s up and coming in SQL Server and Windows 2016, lots of new functionality which I’ll be playing with in my lab at home. He also showed us a cluster he’d setup with one node in Amazon AWS and one node in Azure! I dread to think how long that took to configure and setup…

Anyway not going to give too much information about the pre-con (I don’t think that would be appreciated). What I will do is post some notes I took. They’re just a bunch of links for subjects that I want to do further research in, hopefully a couple of blogs posts in there somewhere:-

SQL and Remote Direct Memory Access (RDMA)
https://channel9.msdn.com/Blogs/Regular-IT-Guy/Behind-the-Scenes-with-Storage-Replica-and-RDMA
Converged and Hyper-converged networking
http://www.techrepublic.com/article/hyper-converged-systems-what-you-need-to-know-about-this-hot-virtualization-topic/
Scale out file server (SOFS) for application data
https://technet.microsoft.com/en-us/library/hh831349.aspx
http://blogs.technet.com/b/josebda/
Microsoft Storage Spaces
http://social.technet.microsoft.com/wiki/contents/articles/15198.storage-spaces-overview.aspx
Install SQL Server with SMB fileshares
https://msdn.microsoft.com/en-us/library/hh759341.aspx
Windows 2016 Site Aware Clustering
http://blogs.msdn.com/b/clustering/archive/2015/08/19/10636304.aspx
Windows 2016 Cloud Witness
http://blogs.msdn.com/b/clustering/archive/2014/11/14/10572766.aspx

I’m really going to enjoy the rest of this week.

0

Where is my data stored?

Tables within SQL Server can contain a maximum of 8060 bytes per row. However even though columns are limited to a maximum of 8000 bytes in size, the combined size of all the columns in a table can exceed the 8060 limit. But what happens when this limit is exceeded?

Let’s create a database and have a look:-

USE [master];
GO

CREATE DATABASE [StorageTest];
GO

Now create a table to test with:-

USE [StorageTest];
GO

CREATE TABLE dbo.[Test1]
( PKID		 INT,
  FirstName	 CHAR(24),
  MiddleName CHAR(24),
  LastName	 CHAR(24));
GO

The length of each row within this table will be:-

Column Data Type Size
PKID INT 4 bytes
FirstName CHAR(24) 24 bytes
MiddleName CHAR(24) 24 bytes
LastName CHAR(24) 24 bytes
Total 76 bytes

More info on data types and their sizes

Let’s insert one row and have a look at the size:-

INSERT INTO dbo.[Test1]
(PKID, FirstName, MiddleName, LastName)
VALUES
(1, 'Andrew', 'Philip', 'Pruski');
GO
EXEC sp_spaceused 'Test1';
GO

InitialTableSize1

We can see that 1 row is in the table which is 8KB in size. This is because SQL stores data on 8KB pages so 8KB is the minimum size for a table which has data in it! We can see this by running the undocumented DBCC command, DBCC IND:-

DBCC IND('StorageTest','Test1',1);
GO

TableAllocations1

Not going to go into what each of the columns mean but you can see that this table has two rows. Both classified as “in-row data”, the first row is page type 10 which is an IAM page and the second row is page type 1 (a data page), which contains the row we inserted.

More info on DBCC IND and page types

“In-row data” is exactly what it sounds like, data in the rows on the page.

What happens when we add a column that will take the size of the row over the 8060 limit? I commonly see large columns added to tables with the reason of “we don’t know how big the data will be so we’re making this column as big as SQL will allow” *sigh*. For example:-

ALTER TABLE dbo.[Test1]
ADD [Address] VARCHAR(8000);
GO

Let’s populate that column with some data:-

UPDATE dbo.[Test1]
SET [Address] = 'zz'
WHERE PKID = 1;
GO

What’s the size of the table now?

EXEC sp_spaceused 'Test1';
GO

InitialTableSize1

Still the same, good, but if this column is completely filled to its maximum so the size of a row will be over the limit, what will happen to the size of the table and where will the data be stored?

Let’s fill the column up for the row that is already in the table:-

UPDATE dbo.[Test1]
SET [Address] = REPLICATE('zz',4000)
WHERE PKID = 1;
GO

Now we have a row that has exceeded the maximum size allowed, what has that done to the size of the table?

EXEC sp_spaceused 'Test1';
GO

TableSize2

Huh? The size of the table has tripled from 8KB to 24KB! What’s going? Let’s run DBCC IND again and have a look:-

DBCC IND('StorageTest','Test1',1);
GO

TableAllocations2

There are now two new pages allocated to the table, classified as “Row-overflow data”. Because we’ve gone over the row limit, SQL has allocated a new IAM page and a new page to store the data that has gone over the limit. This means that just by adding that column and populating it to its maximum limit the table has noticeably increased in size. If there were thousands/millions of rows in this table there would be a big impact on the performance of queries SELECTing from it due to the extra amount of page reads that would have to be performed.

I know this is a very contrived example but it’s good to see how SQL reacts to columns being added to tables that can make a row exceed its maximum size.

I have to admit though, I rarely see VARCHAR(4000) added to a table. What’s far more common is VARCHAR(MAX). How does SQL react to having a column with this datatype added to a table? Well, let’s find out.

First reset our table (dropping the column and rebuilding to bring it back down to its original size):-

ALTER TABLE [Test1] DROP COLUMN [Address];

ALTER TABLE [Test1] REBUILD;
GO

Add the column:-

ALTER TABLE dbo.[Test1]
ADD [Address] VARCHAR(MAX);
GO

Populate the row with data:-

UPDATE dbo.[Test1]
SET [Address] = REPLICATE('zz',4000)
WHERE PKID = 1;
GO

Check the size of the table:-

EXEC sp_spaceused 'Test1';
GO

TableSize3

Again the table has tripled in size (ignore the reserved size, it’s larger as we’ve been messing about). If we look at the page allocations we can see why:-

DBCC IND('StorageTest','Test1',1);
GO

TableAllocations3

SQL has allocated two new pages to the table, classified as”LOB data”. This time SQL has pushed out the data in the NVARCHAR(MAX) column to a special page type designed for large data types. Other data types that this applies to are nvarchar(max) and varbinary(max) (also text, ntext and image in previous versions).

So in summary, most data within SQL Server is stored on pages classified as “In-row data”. If the combined size of columns in a row exceeds 8060 bytes, then SQL will push data onto pages classified as “Row-overflow data”. Finally if certain LOB data types are used, then SQL stores that data on pages classifed as “LOB data”.