Sql-Server
我可以優化這個 MERGE 語句嗎?
我正在嘗試在兩個表之間進行單列合併。第一個表 (
VisitorSession
) 有 40,000,000 行。第二個 (ShoppingCart
) 有 9,000,000 行。在我的開發環境中,查詢只需不到 8 分鐘。但是在生產環境中,它應該花費更少(更強大的機器)。但是,我預計查詢至少需要 2 分鐘才能在生產中執行。我知道這個查詢導致開發環境中的其他開發人員超時,這意味著它很容易導致客戶超時。是否有更安全和/或更快的方式來執行此查詢?
declare @dt datetime = cast(dateadd(month, -6, getdate()) as date); merge ShoppingCart as TargetTable -- 07:55 to complete in Dev using ( select * from -- 04:55 to run select, resulting in 12,727,927 rows in Dev ( select visitorid -- int, not null, foreign key ,useripaddress -- varchar(55), null ,row_number() over (partition by visitorid order by createdate desc) as [row] from VisitorSession (nolock) where UserIPAddress is not null and CreateDate > @dt -- createdate is a datetime, not null ) as subTbl where subTbl.[row] = 1 ) as SourceTable on (TargetTable.VisitorID = SourceTable.VisitorID) -- visitorid is not a primary key when matched then update set TargetTable.UserIpAddress = SourceTable.UserIpAddress;
就個人而言,我不喜歡
MERGE
,因為有很多未解決的錯誤:
- 這篇博文(向下滾動到“其他
MERGE
問題”)- 亞歷克斯庫茲涅佐夫的這個答案
- 這些文章在Paul White 的部落格上
- 我在
MERGE
此處發布了一個警示性提示,包括對幾個尚未解決的錯誤的引用。- 而且這個列表甚至可能並不詳盡——考慮到這一小部分使用者發現的錯誤數量,我不得不想知道還有多少尚未被發現?
MERGE
在樂觀並發和競爭條件方面也給人一種錯誤的安全感。有關詳細資訊,請參閱Dan Guzman 的部落格文章。我不想在這裡成為一個恐懼販子。但我也發現語法不直覺且令人生畏。所以我只會在實際需要它的情況下使用它,並且我可以證明我不受上述*任何問題的影響。*我不知道將它用於只能以某種方式結束的操作會獲得什麼
UPDATE
。所以這就是我將如何使用我更熟悉的語法來代替它:
;WITH s AS ( SELECT VisitorID, UserIpAddress FROM ( SELECT VisitorID, UserIpAddress, rn = ROW_NUMBER() OVER (PARTITION BY VisitorID ORDER BY CreateDate DESC) FROM dbo.VisitorSession WHERE UserIpAddress IS NOT NULL AND CreateDate > @dt ) AS x WHERE rn = 1 ) UPDATE c SET c.UserIpAddress = s.UserIpAddress FROM dbo.ShoppingCart AS c INNER JOIN s ON c.VisitorID = s.VisitorID;
您還可以將此操作分成多個塊,以減少對事務日誌的影響,這反過來可能會減少整體持續時間。我在這裡寫了一篇部落格。
以下是我將如何處理這種方法:
DECLARE @dt DATE = DATEADD(MONTH, -6, SYSDATETIME()), @rc INT = 1; WHILE @rc > 0 BEGIN BEGIN TRANSACTION; ;WITH s AS ( SELECT TOP (100000) VisitorID, UserIpAddress FROM ( SELECT VisitorID, UserIpAddress, rn = ROW_NUMBER() OVER (PARTITION BY VisitorID ORDER BY CreateDate DESC) FROM dbo.VisitorSession AS s WHERE UserIpAddress IS NOT NULL AND CreateDate > @dt AND EXISTS ( SELECT 1 FROM dbo.ShoppingCart AS c WHERE c.VisitorID = s.VisitorID AND (c.UserIpAddress <> s.UserIpAddress OR c.UserIpAddress IS NULL) ) ) AS x WHERE rn = 1 ) UPDATE c SET c.UserIpAddress = s.UserIpAddress FROM dbo.ShoppingCart AS c INNER JOIN s ON c.VisitorID = s.VisitorID; SET @rc = @@ROWCOUNT; COMMIT TRANSACTION; END
當然,正如部落格文章所說明的,您可以通過確保您的日誌足夠大來處理整個事務而不必增長來獲得盡可能多的時間 - 大部分延遲可能來自許多適應您的自動增長操作大筆交易。可悲的是,在您完成操作之前,很難嘗試猜測您需要多少事務日誌……