Sql-Server

我可以優化這個 MERGE 語句嗎?

  • May 8, 2020

我正在嘗試在兩個表之間進行單列合併。第一個表 ( VisitorSession) 有 40,000,000 行。第二個 ( ShoppingCart) 有 9,000,000 行。

在我的開發環境中,查詢只需不到 8 分鐘。但是在生產環境中,它應該花費更少(更強大的機器)。但是,我預計查詢至少需要 2 分鐘才能在生產中執行。我知道這個查詢導致開發環境中的其他開發人員超時,這意味著它很容易導致客戶超時。是否有更安全和/或更快的方式來執行此查詢?

declare @dt datetime = cast(dateadd(month, -6, getdate()) as date);

merge ShoppingCart as TargetTable  -- 07:55 to complete in Dev
using 
(
 select * from -- 04:55 to run select, resulting in 12,727,927 rows in Dev
 (
   select
     visitorid  -- int, not null, foreign key
     ,useripaddress  -- varchar(55), null
     ,row_number() over 
     (partition by visitorid order by createdate desc) as [row]
   from VisitorSession (nolock)
   where UserIPAddress is not null
   and CreateDate > @dt   -- createdate is a datetime, not null
 ) as subTbl
 where subTbl.[row] = 1
) as SourceTable
on (TargetTable.VisitorID = SourceTable.VisitorID)  -- visitorid is not a primary key
when matched
 then update set
 TargetTable.UserIpAddress = SourceTable.UserIpAddress;

就個人而言,我不喜歡MERGE,因為有很多未解決的錯誤:

MERGE在樂觀並發和競爭條件方面也給人一種錯誤的安全感。有關詳細資訊,請參閱Dan Guzman 的部落格文章

我不想在這裡成為一個恐懼販子。但我也發現語法不直覺且令人生畏。所以我只會在實際需要它的情況下使用它,並且我可以證明我不受上述*任何問題的影響。*我不知道將它用於只能以某種方式結束的操作會獲得什麼UPDATE

所以這就是我將如何使用我更熟悉的語法來代替它:

;WITH s AS 
(
 SELECT VisitorID, UserIpAddress FROM 
 (
   SELECT 
     VisitorID,
     UserIpAddress,
     rn = ROW_NUMBER() OVER (PARTITION BY VisitorID ORDER BY CreateDate DESC)
   FROM dbo.VisitorSession
   WHERE UserIpAddress IS NOT NULL
   AND CreateDate > @dt
 ) AS x
 WHERE rn = 1
)
UPDATE c
 SET c.UserIpAddress = s.UserIpAddress
 FROM dbo.ShoppingCart AS c
 INNER JOIN s
 ON c.VisitorID = s.VisitorID;

您還可以將此操作分成多個塊,以減少對事務日誌的影響,這反過來可能會減少整體持續時間。我在這裡寫了一篇部落格

以下是我將如何處理這種方法:

DECLARE 
 @dt DATE = DATEADD(MONTH, -6, SYSDATETIME()), 
 @rc INT = 1;

WHILE @rc > 0
BEGIN

 BEGIN TRANSACTION;

 ;WITH s AS 
 (
   SELECT TOP (100000) VisitorID, UserIpAddress FROM
   (
     SELECT 
       VisitorID,
       UserIpAddress,
       rn = ROW_NUMBER() OVER (PARTITION BY VisitorID ORDER BY CreateDate DESC)
     FROM dbo.VisitorSession AS s
     WHERE UserIpAddress IS NOT NULL
     AND CreateDate > @dt
     AND EXISTS
     ( 
       SELECT 1 FROM dbo.ShoppingCart AS c
         WHERE c.VisitorID = s.VisitorID
         AND (c.UserIpAddress <> s.UserIpAddress
         OR c.UserIpAddress IS NULL)
     )
   ) AS x
   WHERE rn = 1
 )
 UPDATE c
   SET c.UserIpAddress = s.UserIpAddress
   FROM dbo.ShoppingCart AS c
   INNER JOIN s
   ON c.VisitorID = s.VisitorID;

 SET @rc = @@ROWCOUNT;

 COMMIT TRANSACTION;
END

當然,正如部落格文章所說明的,您可以通過確保您的日誌足夠大來處理整個事務而不必增長來獲得盡可能多的時間 - 大部分延遲可能來自許多適應您的自動增長操作大筆交易。可悲的是,在您完成操作之前,很難嘗試猜測您需要多少事務日誌……

引用自:https://dba.stackexchange.com/questions/44067