Friday, February 23, 2018

My first experience with SQL Server Database Partitioning

At my work I have inherited a database where the partitioning was not done correctly.  This database holds millions and millions of records for each year.  Hence the need for partitioning.

So I started working to correct/improve the same in a dev environment.  The current table (Table1) that is partitioned is based on filegroup.
After a lot of experiments,  I sought feedback from Partitioning expert Martin Cairney

He suggested me to correct this situation in the below steps.

Step 1. Create a new partition function (yearly_pf) and partition scheme (yearly_ps) that reflects the correct partitions.
The code I used for this is as below.


CREATE PARTITION FUNCTION [yearly_pf](datetime) AS RANGE RIGHT FOR VALUES 
(N'2007-12-31T23:59:59.997', 
N'2008-12-31T23:59:59.997',
N'2009-12-31T23:59:59.997',
N'2010-12-31T23:59:59.997',
N'2011-12-31T23:59:59.997',
N'2012-12-31T23:59:59.997',
N'2013-12-31T23:59:59.997',
N'2014-12-31T23:59:59.997',
N'2015-12-31T23:59:59.997',
N'2016-12-31T23:59:59.997',
N'2017-12-31T23:59:59.997',
N'2018-12-31T23:59:59.997')
GO




/****** Object:  Create Partition scheme      ******/
CREATE PARTITION SCHEME [yearly_ps] AS PARTITION [yearly_pf] 
TO ([FG_2007],
[FG_2008],
[FG_2009],
[FG_2010],
[FG_2011],
[FG_2012],
[FG_2013],
[FG_2014],
[FG_2015],
[FG_2016],
[FG_2017],
[FG_2018],
 [PRIMARY])
GO


Step 2. Create another table (Table 3) with the same schema as the existing table Table1.  But base this on the partition scheme rather than on filegroup.
For this I scripted the create table script from Table 1 as this will all the constraints of the existing table.  But I changed the On filegroup statement to On yearly_ps

Step 3.  Create a staging table (Table 2) to insert data from Table 1 on a year by year basis.
For this I used the create table script from Table 1

Step 4.  Insert the data from the staging table with a where clause that includes yearly data into Table 3.

Step 5.  After all the data is inserted into Table 3, rename Table 1 an rename Table 3 to Table 1.  When you are satisfied with everything, then drop Table 1.

I did this process from 2007 to 2012.
After that I started wondering do we really need the staging table. 
Since this is a dev environment, I thought I will try without the staging table.

So I repeated the steps 1 to 2 above after dropping Table 3.  Then I started to Insert data straight into Table 3 on a yearly basis.  I checked every time whether the yearly data is going to the correct partition or not.
The script I used for this is as follows.


SELECT  
    OBJECT_SCHEMA_NAME(p.object_id) AS SchemaName
  , OBJECT_NAME(p.object_id) AS ObjectName
  , i.name AS IndexName
  , p.index_id AS IndexID
  , ds.name AS PartitionScheme
  , p.partition_number AS PartitionNumber
  , COALESCE(fg.name, fg2.name) AS FileGroupName
  , prv_left.value AS LowerBoundaryValue
  , prv_right.value AS UpperBoundaryValue
  , CASE ISNULL(CAST(pf.boundary_value_on_right as SMALLINT),-1)
      WHEN 1 THEN 'RIGHT'
      WHEN -1 THEN 'N/A'
      ELSE 'LEFT'
    END AS PartitionFunctionRange
  , p.rows AS Rows
FROM    sys.partitions AS p
    INNER JOIN sys.indexes AS i 
        ON i.object_id = p.object_id
           AND i.index_id = p.index_id
    INNER JOIN sys.data_spaces AS ds 
        ON ds.data_space_id = i.data_space_id
    LEFT JOIN sys.partition_schemes AS ps 
        ON ps.data_space_id = ds.data_space_id
    LEFT JOIN sys.partition_functions AS pf 
        ON pf.function_id = ps.function_id
    LEFT JOIN sys.destination_data_spaces AS dds2 
        ON dds2.partition_scheme_id = ps.data_space_id
           AND dds2.destination_id = p.partition_number
    LEFT JOIN sys.data_spaces dsp 
     ON ds.data_space_id = dsp.data_space_id
    LEFT JOIN sys.filegroups AS fg 
     ON fg.data_space_id = dds2.data_space_id
    LEFT JOIN sys.filegroups AS fg2 
        ON fg2.data_space_id = dsp.data_space_id
    LEFT JOIN sys.partition_range_values AS prv_left 
        ON ps.function_id = prv_left.function_id
           AND prv_left.boundary_id = p.partition_number - 1
    LEFT JOIN sys.partition_range_values AS prv_right 
        ON ps.function_id = prv_right.function_id
AND prv_right.boundary_id = p.partition_number
where OBJECT_NAME(p.object_id)  = 'Table3'
)
ORDER BY ObjectName, IndexId, PartitionNUmber ;

The process was smooth.

I would like to implement this in Production.  I would like some feedback of the viewers of this blog as to whether I am doing any blunders here.




No comments:

Power BI Did You Know -- Series 1 (Contact Us button)

Did you know that you can add a contact us button in your Power BI reports ? Here is how you add a contact us button in your power bi rep...