Sql-Server

幫助性能調整 master/detail (email like inbox) SQL查詢

  • December 17, 2019

在過去的幾天裡,我一直在搜尋、觀看影片,我想我已經盡我所能地摸索著完成了。鑑於下面的範例,我正在尋找更具體的方向。

我有兩張正在使用的桌子。MessageThreads(400k 記錄)和 Messages(1M 記錄)。它們的模式如下所示。

消息表 在此處輸入圖像描述

MessageThreads 索引

https://gist.github.com/timgabrhel/0a9ff88160ebc9e40559e1e10ecc7ee4

消息索引

https://gist.github.com/timgabrhel/d649074cbe82016e8a90f918c58c4764

我正在嘗試提高我們的主要“收件箱”查詢的性能。想想您的電子郵件提供商的收件箱。您會看到一個執行緒列表,一些是新的,一些是已讀的,按日期排序,還可以預覽最近發送的消息,無論是發給您的還是發給您的。最後,這個查詢有一個分頁元素。預設情況下,我們需要 11 個項目。10 表示要顯示的頁面,+1 表示下一頁是否還有更多內容。

對於我們的一些長期使用者,他們可以擁有多達 40K 條消息。

在過去的幾天裡,這個查詢已經看到了許多不同的形式,但這是我到達的地方。我試過OUTER APPLY了,但我看到執行時間和統計數據更差。

SET STATISTICS IO ON; /* And turn on the Actual Excecution Plan */

declare @UserId bigint
set @UserId = 9999

; WITH cte AS (
   SELECT
       ROW_NUMBER() OVER (ORDER BY SendDate DESC) AS RowNum, 
       MT.MessageThreadId, 
       MT.FromUserHasArchived, 
       MT.ToUserHasArchived, 
       MT.Created, 
       MT.ThreadStartedBy, 
       MT.ThreadSentTo, 
       MT.[Subject], 
       MT.CanReply, 
       MT.FromUserDeleted, 
       MT.ToUserDeleted,              
       LM.MessageId, 
       LM.Deleted, 
       LM.FromUserId, 
       LM.ToUserId, 
       LM.[Message], 
       LM.SendDate, 
       LM.ReadDate
   FROM MessageThreads MT 
   -- join the most recent non-deleted message where this user is the sender or receiver
   LEFT OUTER JOIN 
   (
       SELECT RANK() OVER (PARTITION BY MessageThreadId ORDER BY SendDate DESC) r, * 
       FROM [Messages] 
       WHERE (FromUserId=@UserId OR ToUserId=@UserId) 
       AND (Deleted=0)
   ) LM ON (LM.MessageThreadId = MT.MessageThreadId AND LM.r = 1) 
   --WHERE MT.ThreadSentTo=@UserId OR MT.ThreadStartedBy=@UserId   
)
SELECT
   cte.*,
   UserFrom.FirstName AS UserFromFirstName, 
   UserFrom.LastName AS UserFromLastName, 
   UserFrom.Email AS UserFromEmail,                  
   UserTo.FirstName AS UserToFirstName, 
   UserTo.LastName AS UserToLastName, 
   UserTo.Email AS UserToEmail  
FROM cte
LEFT OUTER JOIN Users AS UserFrom ON cte.FromUserId=UserFrom.UserId 
LEFT OUTER JOIN Users AS UserTo ON cte.ToUserId=UserTo.UserId 
WHERE RowNum >= 1 
AND RowNum <= 11   
ORDER BY RowNum ASC

上述查詢的統計資訊(SSMS 中的執行時間約為 2 秒)。這個執行時間是可以接受的,但統計數據感覺不太理想,在查看實際執行計劃時更是如此。 查詢統計

執行計劃連結在這裡 https://gist.github.com/timgabrhel/f8d919d5728e965623fbd953f7a219ef

我發現的一個大問題是 MessageThreads 表上的 400k 行索引掃描。這大概是因為主SELECT X FROM MessageThreads查詢上沒有過濾器。當我對其應用謂詞時(從查詢中取消註釋 WHERE),統計數據大大改善(如下),但在 SSMS 中時間從 ~2 秒跳到 ~18 秒。

查詢統計 2

查詢的問題區域是 MessageThreads 謂詞

執行計劃 https://gist.github.com/timgabrhel/1383ff9362567fdf41ba011dead63ceb

先感謝您!

一些想法:

  1. 您的 WHERE 子句需要一個支持索引

WHERE MT.ThreadSentTo=@UserId OR MT.ThreadStartedBy=@UserId確實需要兩個索引才能有效 - 一個在 ThreadSentTo 欄位上,一個在 ThreadStartedBy 欄位上。否則,SQL 引擎將執行全表掃描以檢索正確的執行緒。

  1. 使用 OFFSET… NEXT N ROWS ONLY 而不是 ROW_NUMBER()

從 SQL 2012 開始,向 SQL Server 添加了一個新結構來處理分頁。這像這樣工作:

DECLARE @PageNumber int = 20
DECLARE @RowsPerPage int = 15

SELECT *
FROM MyTable T
INNER JOIN MyDetailTable D
   ON T.MyTableID = D.MyTableID
OFFSET (@PageNumber - 1) * @RowsPerPage ROWS
FETCH NEXT @RowsPerPage ROWS ONLY

在這種情況下,查詢將跳過前 285 ((20-1)*15) 行,並檢索接下來的 15 行。這是一種比用於正常分頁的舊 RowNumber() 過濾器更快的分頁方法。

重新創建表

CREATE TABLE dbo.Messages(MessageID BIGINT NOT NULL PRIMARY KEY,
MessageThreadID bigint not null,
Deleted bit null,
FromUserID bigint null,
ToUserId bigint null,
Message nvarchar(max) not null,
SendDate Datetime not null,
ReadDate datetime null);



CREATE TABLE dbo.MessageThreads (
MessageThreadID bigint not null PRIMARY KEY,
FromUserHasArchived bit not null,
ToUserHasArchived bit not null,
Created datetime not null,
ThreadStartedBy bigint null,
ThreadSentTo bigint null,
Subject varchar(50) not null,
CanReply bit not null,
FromUserDeleted bit not null,
ToUserDeleted bit not null);

重新創建數據

DECLARE @message nvarchar(max)
SET @message = REPLICATE(CAST(N'B' as nvarchar(max)),200)

INSERT INTO Dbo.Messages WITH(TABLOCK)
(MessageID,MessageThreadID,Deleted,FromUserID,ToUserId,Message,SendDate,ReadDate)
SELECT TOP(1000000)
ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),
 ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),
0,
ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) % 10000,
(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) + 1000) % 10000,
@message,
DATEADD(Second,- ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),getdate()),
DATEADD(Second,- ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),getdate())
FROM MASTER..spt_values spt1
CROSS APPLY MASTER..spt_values spt2;


INSERT INTO dbo.MessageThreads
SELECT TOP(400000)
ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),
0,
0,
DATEADD(Second,- ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),getdate()),
ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),
 ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),
 'bla',
 0,
 0,
 0

FROM MASTER..spt_values spt1
CROSS APPLY MASTER..spt_values spt2;


UPDATE TOP(20000) Messages 
SET ToUserId= 9999

UPDATE TOP(20000) Messages 
SET FromUserID = 9999

查詢

有些部分與您的原始查詢匹配:

在此處輸入圖像描述

使用偏移方法仍然顯示散列匹配溢出和其他問題

SET STATISTICS IO ON; /* And turn on the Actual Excecution Plan */

declare @UserId bigint
set @UserId = 9999
DECLARE @PageNumber int = 1
DECLARE @RowsPerPage int = 11



; WITH cte AS (
   SELECT

       MT.MessageThreadId, 
       MT.FromUserHasArchived, 
       MT.ToUserHasArchived, 
       MT.Created, 
       MT.ThreadStartedBy, 
       MT.ThreadSentTo, 
       MT.[Subject], 
       MT.CanReply, 
       MT.FromUserDeleted, 
       MT.ToUserDeleted,              
       LM.MessageId, 
       LM.Deleted, 
       LM.FromUserId, 
       LM.ToUserId, 
       LM.[Message], 
       LM.SendDate, 
       LM.ReadDate
   FROM MessageThreads MT 
   -- join the most recent non-deleted message where this user is the sender or receiver
   LEFT OUTER JOIN 
   (
       SELECT RANK() OVER (PARTITION BY MessageThreadId ORDER BY SendDate DESC) r, * 
       FROM [Messages] 
       WHERE (FromUserId=@UserId OR ToUserId=@UserId) 
       AND (Deleted=0)
   ) LM ON (LM.MessageThreadId = MT.MessageThreadId AND LM.r = 1) 
   --WHERE MT.ThreadSentTo=@UserId OR MT.ThreadStartedBy=@UserId   
)
SELECT
   cte.*
FROM cte

ORDER BY SendDate DESC  
OFFSET (@PageNumber - 1) * @RowsPerPage ROWS
FETCH NEXT @RowsPerPage ROWS ONLY;


 SQL Server Execution Times:    CPU time = 2170 ms,  elapsed time =
2402 ms.

附帶說明,將 更改為LEFT OUTER JOIN 可以INNER JOIN減少 cpu 時間和經過的時間

  CPU time = 609 ms,  elapsed time = 745 ms.

在此處輸入圖像描述

但這可能是不可能的,但給了我們需要優化的第一個提示。

作為下一步,您可以考慮刪除RANK()並使用MAX()withGROUP BY來處理查詢問題部分的較少列。

SET STATISTICS IO,TIME ON; /* And turn on the Actual Excecution Plan */

declare @UserId bigint
set @UserId = 9999
DECLARE @PageNumber int = 1
DECLARE @RowsPerPage int = 11



; WITH cte AS (
   SELECT

       MT.MessageThreadId, 
       MT.FromUserHasArchived, 
       MT.ToUserHasArchived, 
       MT.Created, 
       MT.ThreadStartedBy, 
       MT.ThreadSentTo, 
       MT.[Subject], 
       MT.CanReply, 
       MT.FromUserDeleted, 
       MT.ToUserDeleted,              
       LM.SendDate

   FROM MessageThreads MT  WITH(INDEX([IX_MessageThreadId_SendDate]))
   -- join the most recent non-deleted message where this user is the sender or receiver
   LEFT OUTER JOIN 
   (
       SELECT MAX(SendDate) as SendDate,MessageThreadId
       FROM [Messages] 
       WHERE (FromUserId=@UserId OR ToUserId=@UserId) 
       AND (Deleted=0)
       GROUP BY MessageThreadId
   ) LM ON (LM.MessageThreadId = MT.MessageThreadId) 
   --WHERE MT.ThreadSentTo=@UserId OR MT.ThreadStartedBy=@UserId   
)
SELECT
   cte.*,        
       LM.MessageId, 
       LM.Deleted, 
       LM.FromUserId, 
       LM.ToUserId, 
       LM.[Message]

FROM cte
LEFT JOIN [Messages] LM
ON cte.MessageThreadID = LM.MessageThreadId
AND cte.SendDate = LM.SendDate
ORDER BY SendDate DESC  
OFFSET (@PageNumber - 1) * @RowsPerPage ROWS
FETCH NEXT @RowsPerPage ROWS ONLY;

這確實消除了我的雜湊匹配溢出,但時間仍然很高 在此處輸入圖像描述

SQL Server Execution Times:
  CPU time = 1950 ms,  elapsed time = 1223 ms.

然後,我們可以通過將 OR() 顯式寫入兩部分來刪除其中一個鍵查找:

SET STATISTICS IO,TIME ON; /* And turn on the Actual Excecution Plan */

declare @UserId bigint
set @UserId = 9999
DECLARE @PageNumber int = 1
DECLARE @RowsPerPage int = 11



; WITH cte AS (
   SELECT

       MT.MessageThreadId, 
       MT.FromUserHasArchived, 
       MT.ToUserHasArchived, 
       MT.Created, 
       MT.ThreadStartedBy, 
       MT.ThreadSentTo, 
       MT.[Subject], 
       MT.CanReply, 
       MT.FromUserDeleted, 
       MT.ToUserDeleted,              
       LM.SendDate

   FROM MessageThreads MT  WITH(INDEX([IX_MessageThreadId_SendDate]))
   -- join the most recent non-deleted message where this user is the sender or receiver
   LEFT OUTER JOIN 
   (
       SELECT MAX(SendDate) as SendDate,MessageThreadId
       FROM  
       (SELECT SendDate,MessageThreadId
        FROM [Messages]     
        WHERE (FromUserId=@UserId ) 
        AND (Deleted=0) 
       UNION
       SELECT SendDate,MessageThreadId
       FROM [Messages]  
       WHERE  ToUserId=@UserId
       AND (Deleted=0)) AS A2
       GROUP BY MessageThreadId
   ) LM ON (LM.MessageThreadId = MT.MessageThreadId) 
   --WHERE MT.ThreadSentTo=@UserId OR MT.ThreadStartedBy=@UserId   
)
SELECT
   cte.*,        
       LM.MessageId, 
       LM.Deleted, 
       LM.FromUserId, 
       LM.ToUserId, 
       LM.[Message]

FROM cte
LEFT JOIN [Messages] LM
ON cte.MessageThreadID = LM.MessageThreadId
AND cte.SendDate = LM.SendDate
ORDER BY SendDate DESC  
OFFSET (@PageNumber - 1) * @RowsPerPage ROWS
FETCH NEXT @RowsPerPage ROWS ONLY;

並添加這兩個索引:

CREATE INDEX IX_Messages_FromUserId_MessageThreadId_SendDate
ON Dbo.Messages(FromUserId,MessageThreadId,SendDate)
INCLUDE(Deleted)
WHERE Deleted = 0;

CREATE INDEX IX_Messages_ToUserID_MessageThreadId_SendDate
ON Dbo.Messages(ToUserID,MessageThreadId,SendDate)
INCLUDE(Deleted)
WHERE Deleted = 0;

執行時間處理時間:

SQL Server Execution Times:
  CPU time = 1747 ms,  elapsed time = 1050 ms.

這仍然不是一個理想的最終結果,這就是為什麼在下一部分中我們將messagethread使用您在問題中指定的過濾器對錶格進行過濾。


過濾消息執行緒表

先前創建的查詢將與您指定的 where 子句一起使用:

WHERE MT.ThreadSentTo=@UserId 
   OR MT.ThreadStartedBy=@UserId

與您匹配的數據集的更新:

UPDATE  TOP (20000) MessageThreads
SET ThreadSentTo = 9999
FROM MessageThreads;
UPDATE  TOP (20000) MessageThreads
SET ThreadStartedBy = 9999
FROM MessageThreads;

WHERE添加了過濾器的完整查詢

SET STATISTICS IO,TIME ON; /* And turn on the Actual Excecution Plan */

declare @UserId bigint
set @UserId = 9999
DECLARE @PageNumber int = 1
DECLARE @RowsPerPage int = 11
--WHERE MT.ThreadSentTo=@UserId OR MT.ThreadStartedBy=@UserId 



; WITH cte AS (
   SELECT

       MT.MessageThreadId, 
       MT.FromUserHasArchived, 
       MT.ToUserHasArchived, 
       MT.Created, 
       MT.ThreadStartedBy, 
       MT.ThreadSentTo, 
       MT.[Subject], 
       MT.CanReply, 
       MT.FromUserDeleted, 
       MT.ToUserDeleted,              
       LM.SendDate

   FROM MessageThreads MT  
   -- join the most recent non-deleted message where this user is the sender or receiver
   LEFT OUTER JOIN 
   (
       SELECT MAX(SendDate) as SendDate,MessageThreadId
       FROM  
       (SELECT SendDate,MessageThreadId
        FROM [Messages]     
        WHERE (FromUserId=@UserId ) 
        AND (Deleted=0) 
       UNION
       SELECT SendDate,MessageThreadId
       FROM [Messages]  
       WHERE  ToUserId=@UserId
       AND (Deleted=0)) AS A2
       GROUP BY MessageThreadId
   ) LM ON (LM.MessageThreadId = MT.MessageThreadId) 
WHERE MT.ThreadSentTo=@UserId 
OR MT.ThreadStartedBy=@UserId 
)
SELECT
   cte.*,        
       LM.MessageId, 
       LM.Deleted, 
       LM.FromUserId, 
       LM.ToUserId, 
       LM.[Message]

FROM cte
LEFT JOIN [Messages] LM
ON cte.MessageThreadID = LM.MessageThreadId
AND cte.SendDate = LM.SendDate
ORDER BY SendDate DESC  
OFFSET (@PageNumber - 1) * @RowsPerPage ROWS
FETCH NEXT @RowsPerPage ROWS ONLY;

執行計劃看起來更清晰,即使有LEFT OUTER JOIN

在此處輸入圖像描述

執行時間處理時間:

SQL Server Execution Times:
  CPU time = 219 ms,  elapsed time = 221 ms.

我們仍然有一個可以被這兩個索引刪除的殘差謂詞:

CREATE INDEX IX_ThreadSentTo_MessageThreadId
ON MessageThreads(ThreadSentTo,MessageThreadId)
INCLUDE
(
FromUserHasArchived, 
ToUserHasArchived, 
Created, 
ThreadStartedBy, 
[Subject], 
CanReply, 
FromUserDeleted, 
ToUserDeleted);
CREATE INDEX IX_ThreadStartedBy_MessageThreadId
ON MessageThreads(ThreadStartedBy,MessageThreadId)
INCLUDE
(

       FromUserHasArchived, 
       ToUserHasArchived, 
       Created, 
       ThreadSentTo, 
       [Subject], 
       CanReply, 
       FromUserDeleted, 
       ToUserDeleted);

但是在我添加索引時,性能從~200ms 的經過時間下降到~800ms 的經過時間。

沒有在消息執行緒上添加索引的執行計劃 (大約 200 毫秒的經過時間)

在消息執行緒上添加索引的執行計劃 (大約 800 毫秒的經過時間)

引用自:https://dba.stackexchange.com/questions/254828