Sql-Server

每日動態表分區

  • August 15, 2017

我有一個 SQL Server 數據庫,其中包含兩個表 -AcksLogs.

這兩個表在邏輯上是相關的,但不是關係數據庫的方式。基本上,傳入的每條消息都會保存在Log表中,如果我們的伺服器確認它,那麼該 ack 就會儲存在Ack表中。

我們每天儲存大約 500 萬個 Acks 和 300 萬個日誌。我試圖在每日邊界上對這兩個表進行分區,以便我們可以輕鬆地從表中刪除舊分區,並提高查詢性能。

我以前沒有做過表分區,所以我一直在閱讀一些線上教程,但是我被困在一件事上。我遵循的所有教程似乎都是手動添加文件組和手動添加邊界。

我希望 SQL Server 以某種方式每天都這樣做,這就是我的問題所在。我需要它來為第二天創建新的文件組,比如每天 22:00。然後在 24:00 插入應該開始填滿新一天的分區。

誰能指出我如何實現這一目標的正確方向?一個全面的教程或一些好的舊建議也可以。

我的第二個問題:我可以以某種方式將相同的分區函式應用於兩個不同的表嗎?

他們都有一個datetime(2)我想要分區的列,並且將適用相同的規則。

那如何適應我的文件組?我一天需要一個文件組嗎?每個表是否在該文件組中都有一個文件,還是兩個表都保存到文件組中的同一個文件中?

我必須為每個文件組製作一個.mdfand嗎?.ldf還是整個數據庫還有一個日誌文件?

從 SQL Server 2008 SP2 和 SQL Server 2008 R2 SP1 開始,支持15,000 個分區,所以說實話,您實際上並不需要動態地做那麼多。與其讓複雜的日常流程(動態添加文件組和邊界)有可能失敗,不如簡單地從現在到 2020 年預先創建分區,並且您完全在您的限制範圍內並且非常適合未來。

可以將所有分區分配給一個文件組(確定不一定是一個很好的模式),或者在有限數量之間進行循環。換句話說,沒有技術需要每天有一個文件組,例如

-- Assign all to one filegroup; ok not necessarily great
CREATE PARTITION SCHEME ps_test AS PARTITION pf_test ALL TO ( [FG1] )

-- Or round-robin
CREATE PARTITION SCHEME ps_test AS PARTITION pf_test ALL TO ( [FG1], [FG2], [FG3], [FG4], [FG5], [FG6], [FG7], [FG1] ... etc )

顯然使用 Excel 或某些工具為您生成腳本 - 無需輸入它們:)

使用 DMV sys.partition_range_values和元數據函式$PARTITION來計算有關哪些數據在哪裡的資訊。創建一個日常作業來切換和截斷最舊的分區。我認為這比每日添加的風險低。

**警告!!**仔細閱讀白皮書,因為這需要啟用,並且這種方法存在一些問題(例如,不支持在具有超過 1,000 個分區的表上創建和重建非對齊索引)。如果您覺得規避風險,1,000 個分區的標準限制仍然允許您預分配不到 3 年。

由於您確實想按 DATE 而不是 DATETIME2 進行分區,因此請考慮計算列。不過,我可能想先對此進行性能測試。

還有一個關於 codeplex 的工具(SQL Server Partition Management),雖然我沒有使用過,但可能值得一看。

要回答您的其他問題,數據庫應該只有一個日誌文件。將其他文件添加為 .ndf 而不是 .mdf。要在同一張表上使用相同的分區方案(不是函式),只需使用相同的方案創建它們,它們會將數據劃分到文件組下的文件中,例如

CREATE TABLE dbo.yourTable (    ...

   CONSTRAINT PK_yourTable PRIMARY KEY ( rowId, someDate )

   ) ON ps_test(someDate)

好的,這將是一個很長的答案,但我已經完成了一個展示,說明這樣的事情是如何工作的。它確實提醒我,分區切換的好處在於,作為僅元數據的操作,它是即時的。需要明確的是,這是一個展示原理和一些範例“操作方法”程式碼的展示,它不是生產質量。在開發或測試環境中執行之前,請仔細閱讀並確保您理解它。您將需要大約 200MB 的空間。

------------------------------------------------------------------------------------------------
-- Setup START
-- Demo runs on my laptop in < 1 minute (ok on SSD)
-- You'll need 200MB space
------------------------------------------------------------------------------------------------

USE master
GO

IF EXISTS ( SELECT * FROM sys.databases WHERE name = 'tooManyPartitionsTest' )
   ALTER DATABASE tooManyPartitionsTest SET SINGLE_USER WITH ROLLBACK IMMEDIATE
GO
IF EXISTS ( SELECT * FROM sys.databases WHERE name = 'tooManyPartitionsTest' )
   DROP DATABASE tooManyPartitionsTest 
GO
CREATE DATABASE tooManyPartitionsTest
GO

ALTER DATABASE tooManyPartitionsTest SET RECOVERY SIMPLE
GO

-- Add 7 filegroups with 4 files each
-- Add 365 files and filegroup
DECLARE @fg INT = 0, @f INT = 0, @sql NVARCHAR(MAX)

WHILE @fg < 7
BEGIN

   SET @fg += 1
   SET @sql = 'ALTER DATABASE tooManyPartitionsTest ADD FILEGROUP tooManyPartitionsTestFg' + CAST( @fg AS VARCHAR(5) )

   -- Add the filegroup
   PRINT @sql
   EXEC(@sql)


   -- Initialise
   SET @f = 0

   WHILE @f < 4
   BEGIN

       SET @f += 1
       --!!WARNING!! DON'T USE THESE SETTINGS IN PRODUCTION.  3MB starting size and 1MB filegrowth are just for demo - would be extremely painful for live data
       SET @sql = 'ALTER DATABASE tooManyPartitionsTest ADD FILE ( NAME = N''tooManyPartitionsTestFile@f_@fg'', FILENAME = N''s:\temp\tooManyPartitionsTestFile@f_@fg.ndf'', SIZE = 3MB, FILEGROWTH = 1MB ) TO FILEGROUP [tooManyPartitionsTestFg@fg]'
       SET @sql = REPLACE ( @sql, '@fg', @fg )
       SET @sql = REPLACE ( @sql, '@f', @f )

       -- Add the file
       PRINT @sql
       EXEC(@sql)

   END

END
GO


USE tooManyPartitionsTest
GO

SELECT * FROM sys.filegroups
SELECT * FROM sys.database_files 
GO

-- Generate partition function with ~3 years worth of daily partitions from 1 Jan 2014.
DECLARE @bigString NVARCHAR(MAX) = ''

;WITH cte AS (
SELECT CAST( '30 Apr 2014' AS DATE ) testDate
UNION ALL
SELECT DATEADD( day, 1, testDate )
FROM cte
WHERE testDate < '31 Dec 2016'
)
SELECT @bigString += ',' + QUOTENAME( CONVERT ( VARCHAR, testDate, 106 ), '''' )
FROM cte
OPTION ( MAXRECURSION 1100 )

SELECT @bigString = 'CREATE PARTITION FUNCTION pf_test (DATE) AS RANGE RIGHT FOR VALUES ( ' + STUFF( @bigString, 1, 1, '' ) + ' )'
SELECT @bigString bs

-- Create the partition function
PRINT @bigString
EXEC ( @bigString )
GO

/*
-- Look at the boundaries
SELECT *
FROM sys.partition_range_values
WHERE function_id = ( SELECT function_id FROM sys.partition_functions WHERE name = 'pf_test' )
GO
*/

DECLARE @bigString NVARCHAR(MAX) = ''

;WITH cte AS (
SELECT ROW_NUMBER() OVER( ORDER BY boundary_id ) rn
FROM sys.partition_range_values
WHERE function_id = ( SELECT function_id FROM sys.partition_functions WHERE name = 'pf_test' )
UNION ALL 
SELECT 1    -- additional row required for fg
)
SELECT @bigString += ',' + '[tooManyPartitionsTestFg' + CAST( ( rn % 7 ) + 1 AS VARCHAR(5) ) + ']'
FROM cte
OPTION ( MAXRECURSION 1100 )

SELECT @bigString = 'CREATE PARTITION SCHEME ps_test AS PARTITION pf_test TO ( ' + STUFF( @bigString, 1, 1, '' ) + ' )'
PRINT @bigString
EXEC ( @bigString )
GO




IF OBJECT_ID('dbo.yourLog') IS NULL
CREATE TABLE dbo.yourLog ( 
   logId       INT IDENTITY,
   someDate    DATETIME2 NOT NULL,
   someData    UNIQUEIDENTIFIER DEFAULT NEWID(),
   dateAdded   DATETIME DEFAULT GETDATE(), 
   addedBy     VARCHAR(30) DEFAULT SUSER_NAME(), 

   -- Computed column for partitioning?
   partitionDate AS CAST( someDate AS DATE ) PERSISTED,

   CONSTRAINT pk_yourLog PRIMARY KEY ( logId, partitionDate )  -- << !!TODO try other way round

   ) ON [ps_test]( partitionDate )
GO


IF OBJECT_ID('dbo.yourAcks') IS NULL
CREATE TABLE dbo.yourAcks ( 
   ackId           INT IDENTITY(100000,1),
   logId           INT NOT NULL,
   partitionDate   DATE NOT NULL

   CONSTRAINT pk_yourAcks PRIMARY KEY ( ackId, logId, partitionDate )  

   ) ON [ps_test]( partitionDate )
GO



IF OBJECT_ID('dbo.yourLogSwitch') IS NULL
CREATE TABLE dbo.yourLogSwitch ( 
   logId       INT IDENTITY,
   someDate    DATETIME2 NOT NULL,
   someData    UNIQUEIDENTIFIER DEFAULT NEWID(),
   dateAdded   DATETIME DEFAULT GETDATE(), 
   addedBy     VARCHAR(30) DEFAULT SUSER_NAME(), 

   -- Computed column for partitioning?
   partitionDate AS CAST( someDate AS DATE ) PERSISTED,

   CONSTRAINT pk_yourLogSwitch PRIMARY KEY ( logId, partitionDate )

   ) ON [ps_test]( partitionDate )
GO
-- Setup END
------------------------------------------------------------------------------------------------



------------------------------------------------------------------------------------------------
-- Data START
------------------------------------------------------------------------------------------------

-- OK load up data for Jan 2014 to today.
DECLARE @startDate DATETIME = '1 Jan 2014', @rand INT 

WHILE @startDate < GETDATE()
BEGIN

   -- Add between 1 and 10,000 rows to dbo.yourLog for today
   SET @rand = RAND() * 10000

   ;WITH cte AS (
   SELECT TOP 10000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn
   FROM master.sys.columns c1
       CROSS JOIN master.sys.columns c2
       CROSS JOIN master.sys.columns c3
   )
   INSERT INTO dbo.yourLog (someDate)
   SELECT TOP(@rand) DATEADD( second, rn % 30000, @startDate )
   FROM cte

   -- Add most of the Acks
   INSERT INTO dbo.yourAcks ( logId, partitionDate )
   SELECT TOP 70 PERCENT logId, partitionDate
   FROM dbo.yourLog
   WHERE partitionDate = @startDate

   SET @startDate = DATEADD( day, 1, @startDate )

   CHECKPOINT

END
GO

-- Have a look at the data we've loaded
SELECT 'before yourLog' s, COUNT(*) records, MIN(someDate) minDate, MAX(someDate) maxDate FROM dbo.yourLog 
SELECT 'before yourAcks' s, COUNT(*) records, MIN(partitionDate) minDate, MAX(partitionDate) maxDate FROM dbo.yourAcks

-- You'll see how pre-May data is initially clumped together
SELECT 'before $partition' s, $PARTITION.pf_test( partitionDate ) p, MIN(partitionDate) xMinDate, MAX(partitionDate) xMaxDate, COUNT(*) AS records
FROM dbo.yourLog WITH(NOLOCK) 
GROUP BY $PARTITION.pf_test( partitionDate ) 
ORDER BY xMinDate
GO

-- Data END
------------------------------------------------------------------------------------------------


------------------------------------------------------------------------------------------------
-- Maintenance START
------------------------------------------------------------------------------------------------

-- Oh man, we're behind with our switching and truncation.
-- Create a job that sweeps up.  Do we get blocking?

-- ALTER TABLE dbo.yourLog SWITCH PARTITION 1 TO dbo.yourLogSwitch PARTITION 1
-- TRUNCATE TABLE dbo.yourLogSwitch

-- Let's pretend we only want to maintain up to 30 days ago
DECLARE @testDate DATE
SET @testDate = DATEADD( day, -30, GETDATE() )

-- Create local fast_forward ( forward-only, read-only ) cursor 
DECLARE partitions_cursor CURSOR FAST_FORWARD LOCAL FOR 
SELECT boundary_id, CAST( value AS DATE )
FROM sys.partition_range_values
WHERE function_id = ( SELECT function_id FROM sys.partition_functions WHERE name = 'pf_test' )
 AND value < @testDate

-- Cursor variables
DECLARE @boundary_id INT, @value DATE, @sql NVARCHAR(MAX)

OPEN partitions_cursor

FETCH NEXT FROM partitions_cursor INTO @boundary_id, @value
WHILE @@fetch_status = 0
BEGIN

   -- Switch out and truncate old partition
   SET @sql = 'ALTER TABLE dbo.yourLog SWITCH PARTITION ' + CAST( @boundary_id AS VARCHAR(5) ) + ' TO dbo.yourLogSwitch PARTITION ' + CAST( @boundary_id AS VARCHAR(5) )

   PRINT @sql
   EXEC(@sql)

   -- You could move the data elsewhere from here or just empty it out
   TRUNCATE TABLE dbo.yourLogSwitch

   --!!TODO yourAcks table

   FETCH NEXT FROM partitions_cursor INTO @boundary_id, @value
END

CLOSE partitions_cursor
DEALLOCATE partitions_cursor
GO

-- Maintenance END
------------------------------------------------------------------------------------------------



-- Have a look at the data we've maintained
SELECT 'after yourLog' s, COUNT(*) records, MIN(someDate) minDate, MAX(someDate) maxDate FROM dbo.yourLog 
SELECT 'after yourAcks' s, COUNT(*) records, MIN(partitionDate) minDate, MAX(partitionDate) maxDate FROM dbo.yourAcks

-- You'll see how pre-May data is initially clumped together
SELECT 'after $partition' s, $PARTITION.pf_test( partitionDate ) p, MIN(partitionDate) xMinDate, MAX(partitionDate) xMaxDate, COUNT(*) AS records
FROM dbo.yourLog WITH(NOLOCK) 
GROUP BY $PARTITION.pf_test( partitionDate ) 
ORDER BY xMinDate



-- Remember, date must always be part of query now to get partition elimination
SELECT *
FROM dbo.yourLog
WHERE partitionDate = '1 August 2014'
GO


-- Cleanup
USE master
GO

IF EXISTS ( SELECT * FROM sys.databases WHERE name = 'tooManyPartitionsTest' )
   ALTER DATABASE tooManyPartitionsTest SET SINGLE_USER WITH ROLLBACK IMMEDIATE
GO
IF EXISTS ( SELECT * FROM sys.databases WHERE name = 'tooManyPartitionsTest' )
   DROP DATABASE tooManyPartitionsTest 
GO

要添加新分區,請使用SPLIT RANGE. 假設您有以下分區:

CREATE PARTITION FUNCTION pfTest(int) AS RANGE LEFT FOR VALUES (10, 20, 30);
CREATE PARTITION SCHEME psTest AS PARTITION pfTest TO ([GRP1], [GRP2], [GRP3]);

..您可以通過“拆分”最後一個範圍從(30到無窮大)到(30-39)和(40到無窮大)來添加新分區。這是語法:

ALTER PARTITION FUNCTION pfTest() SPLIT RANGE (40);
ALTER PARTITION SCHEME psTest NEXT USED [GRP4];

除了生成動態 SQL 並按計劃執行(例如在 SQL Server 代理作業中)之外,我不知道有任何其他方法可以自動執行此操作。

分區函式可以應用於任意數量的表。每個表都可以放在一個分區方案上,該方案又連接到一個分區函式。

--- Create the partition function:
CREATE PARTITION FUNCTION fn_part_left(int)
AS RANGE LEFT FOR VALUES (100, 110, 120, 130);

--- Create the partition scheme:
CREATE PARTITION SCHEME ps_part_left AS
PARTITION fn_part_left TO
   ([GROUP_A], [GROUP_B], [GROUP_C], [GROUP_A], [GROUP_B]);

--- Create the table
CREATE TABLE myTable (
   someColumn int NOT NULL,
   ....
) ON [ps_part_left](someColumn);

我在範例中使用“int”作為數據類型,但 datetime2 也可以。

如果需要,您可以在同一個文件組中放置多個分區。在這裡,您必須對負載在不同分區之間的分佈方式進行一些規劃,因此您不會將所有 I/O 負載放在單個文件組上。

一個文件組可以有一個或多個 .mdf 文件。

一個數據庫可以有一個或多個 .ldf 文件。

引用自:https://dba.stackexchange.com/questions/75176