每日動態表分區
我有一個 SQL Server 數據庫,其中包含兩個表 -
Acks
和Logs
.這兩個表在邏輯上是相關的,但不是關係數據庫的方式。基本上,傳入的每條消息都會保存在
Log
表中,如果我們的伺服器確認它,那麼該 ack 就會儲存在Ack
表中。我們每天儲存大約 500 萬個 Acks 和 300 萬個日誌。我試圖在每日邊界上對這兩個表進行分區,以便我們可以輕鬆地從表中刪除舊分區,並提高查詢性能。
我以前沒有做過表分區,所以我一直在閱讀一些線上教程,但是我被困在一件事上。我遵循的所有教程似乎都是手動添加文件組和手動添加邊界。
我希望 SQL Server 以某種方式每天都這樣做,這就是我的問題所在。我需要它來為第二天創建新的文件組,比如每天 22:00。然後在 24:00 插入應該開始填滿新一天的分區。
誰能指出我如何實現這一目標的正確方向?一個全面的教程或一些好的舊建議也可以。
我的第二個問題:我可以以某種方式將相同的分區函式應用於兩個不同的表嗎?
他們都有一個
datetime(2)
我想要分區的列,並且將適用相同的規則。那如何適應我的文件組?我一天需要一個文件組嗎?每個表是否在該文件組中都有一個文件,還是兩個表都保存到文件組中的同一個文件中?
我必須為每個文件組製作一個
.mdf
and嗎?.ldf
還是整個數據庫還有一個日誌文件?
從 SQL Server 2008 SP2 和 SQL Server 2008 R2 SP1 開始,支持15,000 個分區,所以說實話,您實際上並不需要動態地做那麼多。與其讓複雜的日常流程(動態添加文件組和邊界)有可能失敗,不如簡單地從現在到 2020 年預先創建分區,並且您完全在您的限制範圍內並且非常適合未來。
您可以將所有分區分配給一個文件組(確定不一定是一個很好的模式),或者在有限數量之間進行循環。換句話說,沒有技術需要每天有一個文件組,例如
-- Assign all to one filegroup; ok not necessarily great CREATE PARTITION SCHEME ps_test AS PARTITION pf_test ALL TO ( [FG1] ) -- Or round-robin CREATE PARTITION SCHEME ps_test AS PARTITION pf_test ALL TO ( [FG1], [FG2], [FG3], [FG4], [FG5], [FG6], [FG7], [FG1] ... etc )
顯然使用 Excel 或某些工具為您生成腳本 - 無需輸入它們:)
使用 DMV sys.partition_range_values和元數據函式$PARTITION來計算有關哪些數據在哪裡的資訊。創建一個日常作業來切換和截斷最舊的分區。我認為這比每日添加的風險低。
**警告!!**仔細閱讀白皮書,因為這需要啟用,並且這種方法存在一些問題(例如,不支持在具有超過 1,000 個分區的表上創建和重建非對齊索引)。如果您覺得規避風險,1,000 個分區的標準限制仍然允許您預分配不到 3 年。
由於您確實想按 DATE 而不是 DATETIME2 進行分區,因此請考慮計算列。不過,我可能想先對此進行性能測試。
還有一個關於 codeplex 的工具(SQL Server Partition Management),雖然我沒有使用過,但可能值得一看。
要回答您的其他問題,數據庫應該只有一個日誌文件。將其他文件添加為 .ndf 而不是 .mdf。要在同一張表上使用相同的分區方案(不是函式),只需使用相同的方案創建它們,它們會將數據劃分到文件組下的文件中,例如
CREATE TABLE dbo.yourTable ( ... CONSTRAINT PK_yourTable PRIMARY KEY ( rowId, someDate ) ) ON ps_test(someDate)
好的,這將是一個很長的答案,但我已經完成了一個展示,說明這樣的事情是如何工作的。它確實提醒我,分區切換的好處在於,作為僅元數據的操作,它是即時的。需要明確的是,這是一個展示原理和一些範例“操作方法”程式碼的展示,它不是生產質量。在開發或測試環境中執行之前,請仔細閱讀並確保您理解它。您將需要大約 200MB 的空間。
------------------------------------------------------------------------------------------------ -- Setup START -- Demo runs on my laptop in < 1 minute (ok on SSD) -- You'll need 200MB space ------------------------------------------------------------------------------------------------ USE master GO IF EXISTS ( SELECT * FROM sys.databases WHERE name = 'tooManyPartitionsTest' ) ALTER DATABASE tooManyPartitionsTest SET SINGLE_USER WITH ROLLBACK IMMEDIATE GO IF EXISTS ( SELECT * FROM sys.databases WHERE name = 'tooManyPartitionsTest' ) DROP DATABASE tooManyPartitionsTest GO CREATE DATABASE tooManyPartitionsTest GO ALTER DATABASE tooManyPartitionsTest SET RECOVERY SIMPLE GO -- Add 7 filegroups with 4 files each -- Add 365 files and filegroup DECLARE @fg INT = 0, @f INT = 0, @sql NVARCHAR(MAX) WHILE @fg < 7 BEGIN SET @fg += 1 SET @sql = 'ALTER DATABASE tooManyPartitionsTest ADD FILEGROUP tooManyPartitionsTestFg' + CAST( @fg AS VARCHAR(5) ) -- Add the filegroup PRINT @sql EXEC(@sql) -- Initialise SET @f = 0 WHILE @f < 4 BEGIN SET @f += 1 --!!WARNING!! DON'T USE THESE SETTINGS IN PRODUCTION. 3MB starting size and 1MB filegrowth are just for demo - would be extremely painful for live data SET @sql = 'ALTER DATABASE tooManyPartitionsTest ADD FILE ( NAME = N''tooManyPartitionsTestFile@f_@fg'', FILENAME = N''s:\temp\tooManyPartitionsTestFile@f_@fg.ndf'', SIZE = 3MB, FILEGROWTH = 1MB ) TO FILEGROUP [tooManyPartitionsTestFg@fg]' SET @sql = REPLACE ( @sql, '@fg', @fg ) SET @sql = REPLACE ( @sql, '@f', @f ) -- Add the file PRINT @sql EXEC(@sql) END END GO USE tooManyPartitionsTest GO SELECT * FROM sys.filegroups SELECT * FROM sys.database_files GO -- Generate partition function with ~3 years worth of daily partitions from 1 Jan 2014. DECLARE @bigString NVARCHAR(MAX) = '' ;WITH cte AS ( SELECT CAST( '30 Apr 2014' AS DATE ) testDate UNION ALL SELECT DATEADD( day, 1, testDate ) FROM cte WHERE testDate < '31 Dec 2016' ) SELECT @bigString += ',' + QUOTENAME( CONVERT ( VARCHAR, testDate, 106 ), '''' ) FROM cte OPTION ( MAXRECURSION 1100 ) SELECT @bigString = 'CREATE PARTITION FUNCTION pf_test (DATE) AS RANGE RIGHT FOR VALUES ( ' + STUFF( @bigString, 1, 1, '' ) + ' )' SELECT @bigString bs -- Create the partition function PRINT @bigString EXEC ( @bigString ) GO /* -- Look at the boundaries SELECT * FROM sys.partition_range_values WHERE function_id = ( SELECT function_id FROM sys.partition_functions WHERE name = 'pf_test' ) GO */ DECLARE @bigString NVARCHAR(MAX) = '' ;WITH cte AS ( SELECT ROW_NUMBER() OVER( ORDER BY boundary_id ) rn FROM sys.partition_range_values WHERE function_id = ( SELECT function_id FROM sys.partition_functions WHERE name = 'pf_test' ) UNION ALL SELECT 1 -- additional row required for fg ) SELECT @bigString += ',' + '[tooManyPartitionsTestFg' + CAST( ( rn % 7 ) + 1 AS VARCHAR(5) ) + ']' FROM cte OPTION ( MAXRECURSION 1100 ) SELECT @bigString = 'CREATE PARTITION SCHEME ps_test AS PARTITION pf_test TO ( ' + STUFF( @bigString, 1, 1, '' ) + ' )' PRINT @bigString EXEC ( @bigString ) GO IF OBJECT_ID('dbo.yourLog') IS NULL CREATE TABLE dbo.yourLog ( logId INT IDENTITY, someDate DATETIME2 NOT NULL, someData UNIQUEIDENTIFIER DEFAULT NEWID(), dateAdded DATETIME DEFAULT GETDATE(), addedBy VARCHAR(30) DEFAULT SUSER_NAME(), -- Computed column for partitioning? partitionDate AS CAST( someDate AS DATE ) PERSISTED, CONSTRAINT pk_yourLog PRIMARY KEY ( logId, partitionDate ) -- << !!TODO try other way round ) ON [ps_test]( partitionDate ) GO IF OBJECT_ID('dbo.yourAcks') IS NULL CREATE TABLE dbo.yourAcks ( ackId INT IDENTITY(100000,1), logId INT NOT NULL, partitionDate DATE NOT NULL CONSTRAINT pk_yourAcks PRIMARY KEY ( ackId, logId, partitionDate ) ) ON [ps_test]( partitionDate ) GO IF OBJECT_ID('dbo.yourLogSwitch') IS NULL CREATE TABLE dbo.yourLogSwitch ( logId INT IDENTITY, someDate DATETIME2 NOT NULL, someData UNIQUEIDENTIFIER DEFAULT NEWID(), dateAdded DATETIME DEFAULT GETDATE(), addedBy VARCHAR(30) DEFAULT SUSER_NAME(), -- Computed column for partitioning? partitionDate AS CAST( someDate AS DATE ) PERSISTED, CONSTRAINT pk_yourLogSwitch PRIMARY KEY ( logId, partitionDate ) ) ON [ps_test]( partitionDate ) GO -- Setup END ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ -- Data START ------------------------------------------------------------------------------------------------ -- OK load up data for Jan 2014 to today. DECLARE @startDate DATETIME = '1 Jan 2014', @rand INT WHILE @startDate < GETDATE() BEGIN -- Add between 1 and 10,000 rows to dbo.yourLog for today SET @rand = RAND() * 10000 ;WITH cte AS ( SELECT TOP 10000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn FROM master.sys.columns c1 CROSS JOIN master.sys.columns c2 CROSS JOIN master.sys.columns c3 ) INSERT INTO dbo.yourLog (someDate) SELECT TOP(@rand) DATEADD( second, rn % 30000, @startDate ) FROM cte -- Add most of the Acks INSERT INTO dbo.yourAcks ( logId, partitionDate ) SELECT TOP 70 PERCENT logId, partitionDate FROM dbo.yourLog WHERE partitionDate = @startDate SET @startDate = DATEADD( day, 1, @startDate ) CHECKPOINT END GO -- Have a look at the data we've loaded SELECT 'before yourLog' s, COUNT(*) records, MIN(someDate) minDate, MAX(someDate) maxDate FROM dbo.yourLog SELECT 'before yourAcks' s, COUNT(*) records, MIN(partitionDate) minDate, MAX(partitionDate) maxDate FROM dbo.yourAcks -- You'll see how pre-May data is initially clumped together SELECT 'before $partition' s, $PARTITION.pf_test( partitionDate ) p, MIN(partitionDate) xMinDate, MAX(partitionDate) xMaxDate, COUNT(*) AS records FROM dbo.yourLog WITH(NOLOCK) GROUP BY $PARTITION.pf_test( partitionDate ) ORDER BY xMinDate GO -- Data END ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ -- Maintenance START ------------------------------------------------------------------------------------------------ -- Oh man, we're behind with our switching and truncation. -- Create a job that sweeps up. Do we get blocking? -- ALTER TABLE dbo.yourLog SWITCH PARTITION 1 TO dbo.yourLogSwitch PARTITION 1 -- TRUNCATE TABLE dbo.yourLogSwitch -- Let's pretend we only want to maintain up to 30 days ago DECLARE @testDate DATE SET @testDate = DATEADD( day, -30, GETDATE() ) -- Create local fast_forward ( forward-only, read-only ) cursor DECLARE partitions_cursor CURSOR FAST_FORWARD LOCAL FOR SELECT boundary_id, CAST( value AS DATE ) FROM sys.partition_range_values WHERE function_id = ( SELECT function_id FROM sys.partition_functions WHERE name = 'pf_test' ) AND value < @testDate -- Cursor variables DECLARE @boundary_id INT, @value DATE, @sql NVARCHAR(MAX) OPEN partitions_cursor FETCH NEXT FROM partitions_cursor INTO @boundary_id, @value WHILE @@fetch_status = 0 BEGIN -- Switch out and truncate old partition SET @sql = 'ALTER TABLE dbo.yourLog SWITCH PARTITION ' + CAST( @boundary_id AS VARCHAR(5) ) + ' TO dbo.yourLogSwitch PARTITION ' + CAST( @boundary_id AS VARCHAR(5) ) PRINT @sql EXEC(@sql) -- You could move the data elsewhere from here or just empty it out TRUNCATE TABLE dbo.yourLogSwitch --!!TODO yourAcks table FETCH NEXT FROM partitions_cursor INTO @boundary_id, @value END CLOSE partitions_cursor DEALLOCATE partitions_cursor GO -- Maintenance END ------------------------------------------------------------------------------------------------ -- Have a look at the data we've maintained SELECT 'after yourLog' s, COUNT(*) records, MIN(someDate) minDate, MAX(someDate) maxDate FROM dbo.yourLog SELECT 'after yourAcks' s, COUNT(*) records, MIN(partitionDate) minDate, MAX(partitionDate) maxDate FROM dbo.yourAcks -- You'll see how pre-May data is initially clumped together SELECT 'after $partition' s, $PARTITION.pf_test( partitionDate ) p, MIN(partitionDate) xMinDate, MAX(partitionDate) xMaxDate, COUNT(*) AS records FROM dbo.yourLog WITH(NOLOCK) GROUP BY $PARTITION.pf_test( partitionDate ) ORDER BY xMinDate -- Remember, date must always be part of query now to get partition elimination SELECT * FROM dbo.yourLog WHERE partitionDate = '1 August 2014' GO -- Cleanup USE master GO IF EXISTS ( SELECT * FROM sys.databases WHERE name = 'tooManyPartitionsTest' ) ALTER DATABASE tooManyPartitionsTest SET SINGLE_USER WITH ROLLBACK IMMEDIATE GO IF EXISTS ( SELECT * FROM sys.databases WHERE name = 'tooManyPartitionsTest' ) DROP DATABASE tooManyPartitionsTest GO
要添加新分區,請使用
SPLIT RANGE
. 假設您有以下分區:CREATE PARTITION FUNCTION pfTest(int) AS RANGE LEFT FOR VALUES (10, 20, 30); CREATE PARTITION SCHEME psTest AS PARTITION pfTest TO ([GRP1], [GRP2], [GRP3]);
..您可以通過“拆分”最後一個範圍從(30到無窮大)到(30-39)和(40到無窮大)來添加新分區。這是語法:
ALTER PARTITION FUNCTION pfTest() SPLIT RANGE (40); ALTER PARTITION SCHEME psTest NEXT USED [GRP4];
除了生成動態 SQL 並按計劃執行(例如在 SQL Server 代理作業中)之外,我不知道有任何其他方法可以自動執行此操作。
分區函式可以應用於任意數量的表。每個表都可以放在一個分區方案上,該方案又連接到一個分區函式。
--- Create the partition function: CREATE PARTITION FUNCTION fn_part_left(int) AS RANGE LEFT FOR VALUES (100, 110, 120, 130); --- Create the partition scheme: CREATE PARTITION SCHEME ps_part_left AS PARTITION fn_part_left TO ([GROUP_A], [GROUP_B], [GROUP_C], [GROUP_A], [GROUP_B]); --- Create the table CREATE TABLE myTable ( someColumn int NOT NULL, .... ) ON [ps_part_left](someColumn);
我在範例中使用“int”作為數據類型,但 datetime2 也可以。
如果需要,您可以在同一個文件組中放置多個分區。在這裡,您必須對負載在不同分區之間的分佈方式進行一些規劃,因此您不會將所有 I/O 負載放在單個文件組上。
一個文件組可以有一個或多個 .mdf 文件。
一個數據庫可以有一個或多個 .ldf 文件。