My Tech Learning: February 2018

Friday, February 23, 2018

My first experience with SQL Server Database Partitioning

At my work I have inherited a database where the partitioning was not done correctly. This database holds millions and millions of records for each year. Hence the need for partitioning.

So I started working to correct/improve the same in a dev environment. The current table (Table1) that is partitioned is based on filegroup.

After a lot of experiments, I sought feedback from Partitioning expert Martin Cairney

He suggested me to correct this situation in the below steps.

Step 1. Create a new partition function (yearly_pf) and partition scheme (yearly_ps) that reflects the correct partitions.

The code I used for this is as below.

CREATE PARTITION FUNCTION [yearly_pf](datetime) AS RANGE RIGHT FOR VALUES 
(N'2007-12-31T23:59:59.997', 
N'2008-12-31T23:59:59.997',
N'2009-12-31T23:59:59.997',
N'2010-12-31T23:59:59.997',
N'2011-12-31T23:59:59.997',
N'2012-12-31T23:59:59.997',
N'2013-12-31T23:59:59.997',
N'2014-12-31T23:59:59.997',
N'2015-12-31T23:59:59.997',
N'2016-12-31T23:59:59.997',
N'2017-12-31T23:59:59.997',
N'2018-12-31T23:59:59.997')
GO




/****** Object:  Create Partition scheme      ******/
CREATE PARTITION SCHEME [yearly_ps] AS PARTITION [yearly_pf] 
TO ([FG_2007],
[FG_2008],
[FG_2009],
[FG_2010],
[FG_2011],
[FG_2012],
[FG_2013],
[FG_2014],
[FG_2015],
[FG_2016],
[FG_2017],
[FG_2018],
 [PRIMARY])
GO

Step 2. Create another table (Table 3) with the same schema as the existing table Table1. But base this on the partition scheme rather than on filegroup.

For this I scripted the create table script from Table 1 as this will all the constraints of the existing table. But I changed the On filegroup statement to On yearly_ps

Step 3. Create a staging table (Table 2) to insert data from Table 1 on a year by year basis.

For this I used the create table script from Table 1

Step 4. Insert the data from the staging table with a where clause that includes yearly data into Table 3.

Step 5. After all the data is inserted into Table 3, rename Table 1 an rename Table 3 to Table 1. When you are satisfied with everything, then drop Table 1.

I did this process from 2007 to 2012.

After that I started wondering do we really need the staging table.

Since this is a dev environment, I thought I will try without the staging table.

So I repeated the steps 1 to 2 above after dropping Table 3. Then I started to Insert data straight into Table 3 on a yearly basis. I checked every time whether the yearly data is going to the correct partition or not.

The script I used for this is as follows.

SELECT  
    OBJECT_SCHEMA_NAME(p.object_id) AS SchemaName
  , OBJECT_NAME(p.object_id) AS ObjectName
  , i.name AS IndexName
  , p.index_id AS IndexID
  , ds.name AS PartitionScheme
  , p.partition_number AS PartitionNumber
  , COALESCE(fg.name, fg2.name) AS FileGroupName
  , prv_left.value AS LowerBoundaryValue
  , prv_right.value AS UpperBoundaryValue
  , CASE ISNULL(CAST(pf.boundary_value_on_right as SMALLINT),-1)
      WHEN 1 THEN 'RIGHT'
      WHEN -1 THEN 'N/A'
      ELSE 'LEFT'
    END AS PartitionFunctionRange
  , p.rows AS Rows
FROM    sys.partitions AS p
    INNER JOIN sys.indexes AS i 
        ON i.object_id = p.object_id
           AND i.index_id = p.index_id
    INNER JOIN sys.data_spaces AS ds 
        ON ds.data_space_id = i.data_space_id
    LEFT JOIN sys.partition_schemes AS ps 
        ON ps.data_space_id = ds.data_space_id
    LEFT JOIN sys.partition_functions AS pf 
        ON pf.function_id = ps.function_id
    LEFT JOIN sys.destination_data_spaces AS dds2 
        ON dds2.partition_scheme_id = ps.data_space_id
           AND dds2.destination_id = p.partition_number
    LEFT JOIN sys.data_spaces dsp 
     ON ds.data_space_id = dsp.data_space_id
    LEFT JOIN sys.filegroups AS fg 
     ON fg.data_space_id = dds2.data_space_id
    LEFT JOIN sys.filegroups AS fg2 
        ON fg2.data_space_id = dsp.data_space_id
    LEFT JOIN sys.partition_range_values AS prv_left 
        ON ps.function_id = prv_left.function_id
           AND prv_left.boundary_id = p.partition_number - 1
    LEFT JOIN sys.partition_range_values AS prv_right 
        ON ps.function_id = prv_right.function_id
AND prv_right.boundary_id = p.partition_number
where OBJECT_NAME(p.object_id)  = 'Table3'
)
ORDER BY ObjectName, IndexId, PartitionNUmber ;

The process was smooth.

I would like to implement this in Production. I would like some feedback of the viewers of this blog as to whether I am doing any blunders here.

My Tech Learning

Friday, February 23, 2018

My first experience with SQL Server Database Partitioning

Saturday, February 17, 2018

My Presentation at SQLSaturday Wellington

How to download files from Microsoft Fabric Lakehouse ?