Sql-Server

我怎樣才能使這個嵌套查詢更有效?

  • October 10, 2018

我有 3 個表:RoomConferenceParticipantRoom有很多Conferences,並且Conference有很多Participants。我需要我的查詢來顯示來自 的欄位Room,以及它具有的關聯數,以及每個關聯的 sConferences數的總和。這是我為獲取此資訊而編寫的查詢的精簡版;首先,我剛剛選擇了房間 ID:Participant``Conference``SELECT

SELECT TOP(1000)
 rm.[Id]
FROM
 [Room] rm
LEFT JOIN (
 SELECT
   conf.[Id] AS [ConferenceId],
   MIN(conf.[Name]) AS [ConferenceName],
   MIN(conf.[RoomId]) AS [RoomId],
   COUNT(part.[Id]) AS CalcConferenceParticipantCount
 FROM
   [Conference] conf
 LEFT JOIN
   [Participant] part on part.[ConferenceId] = conf.[Id]
 GROUP BY
   conf.[Id]
 ) confData ON confData.[RoomId] = rm.[Id]
GROUP BY
 rm.[Id]

這非常快,因為 SQL Server 能夠從子查詢中提取數據Room並幾乎忽略子查詢(請參見下圖中的試用 1 - 試用 4)。然後我在ConferenceName子查詢的欄位中添加了每個房間的會議數量:

SELECT TOP(1000)
 rm.[Id],
 COUNT(confData.[ConferenceId]) AS CalcRoomConferenceCount,
 MIN(confData.[ConferenceName])
FROM
 [Room] rm
LEFT JOIN (
 SELECT
   conf.[Id] AS [ConferenceId],
   MIN(conf.[Name]) AS [ConferenceName],
   MIN(conf.[RoomId]) AS [RoomId],
   COUNT(part.[Id]) AS CalcConferenceParticipantCount
 FROM
   [Conference] conf
 LEFT JOIN
   [Participant] part on part.[ConferenceId] = conf.[Id]
 GROUP BY
   conf.[Id]
 ) confData ON confData.[RoomId] = rm.[Id]
GROUP BY
 rm.[Id]

這使查詢速度大大降低了大約 100 倍(參見下圖中的試驗 5 - 試驗 7)。然後我從子查詢中添加了參與者計數,這意味著使用了 2 個級別的聚合函式:

SELECT TOP(1000)
 rm.[Id],
 COUNT(confData.[ConferenceId]) AS CalcRoomConferenceCount,
 MIN(confData.[ConferenceName]),
 SUM(confData.[CalcConferenceParticipantCount]) AS CalcRoomParticipantCount
FROM
 [Room] rm
LEFT JOIN (
 SELECT
   conf.[Id] AS [ConferenceId],
   MIN(conf.[Name]) AS [ConferenceName],
   MIN(conf.[RoomId]) AS [RoomId],
   COUNT(part.[Id]) AS CalcConferenceParticipantCount
 FROM
   [Conference] conf
 LEFT JOIN
   [Participant] part on part.[ConferenceId] = conf.[Id]
 GROUP BY
   conf.[Id]
 ) confData ON confData.[RoomId] = rm.[Id]
GROUP BY
 rm.[Id]

這進一步使查詢速度減慢了大約 4 倍(參見下圖中的試驗 8 - 試驗 10)。以下是包含 10 次試驗數據的客戶統計數據:

客戶統計

下面是慢查詢的查詢計劃:https ://www.brentozar.com/pastetheplan/?id=SJpyeec5Q

有沒有一種方法可以使這種查詢 - 我計運算元查詢聚合的聚合 - 更有效?

我通過查看表中的行數來模擬數據,給它們一個均勻的數據分佈,並對模式進行猜測:

DROP TABLE IF EXISTS [Room];

CREATE TABLE [Room] (
   [Id] BIGINT NOT NULL,
   FILLER VARCHAR(200) NOT NULL,
   PRIMARY KEY ([Id])
);

INSERT INTO [Room] WITH (TABLOCK)
SELECT TOP (3088) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), REPLICATE('Z', 200)
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
OPTION (MAXDOP 1);


DROP TABLE IF EXISTS [Conference];

CREATE TABLE [Conference] (
   [Id] BIGINT NOT NULL,
   [Name] VARCHAR(30) NOT NULL,
   [RoomId] BIGINT NOT NULL,
   FILLER VARCHAR(200) NOT NULL,
   PRIMARY KEY ([Id])
);


INSERT INTO [Conference] WITH (TABLOCK)
SELECT RN
, 'MY FAVORITE MEETING ROOM'
, 1 + RN % 3088
, REPLICATE('Z', 200)
FROM
(
   SELECT TOP (97413) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
   FROM master..spt_values t1
   CROSS JOIN master..spt_values t2
) q
OPTION (MAXDOP 1);


DROP TABLE IF EXISTS [Participant];

CREATE TABLE [Participant] (
   [Id] BIGINT NOT NULL,
   [ConferenceId] BIGINT NOT NULL,
   FILLER VARCHAR(200) NOT NULL,
   PRIMARY KEY ([Id])
);


INSERT INTO [Participant] WITH (TABLOCK)
SELECT RN
, 1 + RN % 97413
, REPLICATE('Z', 200)
FROM
(
   SELECT TOP (235323) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
   FROM master..spt_values t1
   CROSS JOIN master..spt_values t2
) q
OPTION (MAXDOP 1);


CREATE INDEX NCI_Part ON [Participant] ([ConferenceId]) INCLUDE (Id);

我對模式所做的最重要的假設是Id列是[Conference]表的主鍵。考慮到查詢計劃和所涉及的索引名稱,這似乎是合理的。

在我的機器上,我得到了和你一樣的查詢計劃,但是我的開始查詢只佔用了 163 毫秒的 CPU。我認為這些差異歸結為硬體、數據分佈的差異,以及我沒有將數據返回給客戶端這一事實。

我首先想到的是派生表中不必要GROUP BY的。是表的主鍵,因此您不需要所有聚合。使用正確的索引(對於這種特殊情況,您已經擁有),子查詢不一定是壞事。重寫你必須刪除的內容:confData``Id``GROUP BY

SELECT TOP(1000)
 rm.[Id],
 COUNT(confData.[ConferenceId]) AS CalcRoomConferenceCount,
 MIN(confData.[ConferenceName]),
 SUM(confData.[CalcConferenceParticipantCount]) AS CalcRoomParticipantCount
FROM
 [Room] rm
LEFT JOIN (
 SELECT
   conf.[Id] AS [ConferenceId],
   conf.[Name] AS [ConferenceName],
   conf.[RoomId] AS [RoomId],
   (
       SELECT COUNT(part.[Id])
       FROM [Participant] part
       WHERE part.[ConferenceId] = conf.[Id]
   ) AS CalcConferenceParticipantCount
 FROM
   [Conference] conf
 ) confData ON confData.[RoomId] = rm.[Id]
GROUP BY
 rm.[Id]
OPTION (USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION'));

這導致流聚合被進一步下推到計劃中:

子查詢

上傳的計劃佔用 113 毫秒的 CPU。存在相同的運算符,但其中一些處理的行更少,從而節省了時間。您可以通過在[Conference]with上定義覆蓋索引Id作為索引鍵來提高此查詢的效率。這似乎是一件奇怪的事情,但您的聚集索引掃描佔用了總查詢時間的 10%,並且可能包括您不需要的列。

如果您想讓查詢更快,您還可以考慮使用索引視圖。當您可以定義一個簡單的索引視圖來為您執行聚合時,為什麼每次都執行聚合?

CREATE VIEW IndexedViewOnParticipant WITH SCHEMABINDING
AS
SELECT [ConferenceId], COUNT_BIG([Id]) CntId, COUNT_BIG(*) Cnt
FROM dbo.[Participant]
GROUP BY [ConferenceId];

GO

CREATE UNIQUE CLUSTERED INDEX CI ON IndexedViewOnParticipant ([ConferenceId]);

這將導致在表上進行 DML 時會產生更多的空間和一點點的成本。總的來說,我會說這是索引視圖的一個很好的案例。再次重寫查詢:

SELECT TOP(1000)
 rm.[Id],
 COUNT(confData.[ConferenceId]) AS CalcRoomConferenceCount,
 MIN(confData.[ConferenceName]),
 SUM(confData.[CalcConferenceParticipantCount]) AS CalcRoomParticipantCount
FROM 
 [Room] rm
LEFT JOIN (
 SELECT
   conf.[Id] AS [ConferenceId],
   conf.[Name] AS [ConferenceName],
   conf.[RoomId] AS [RoomId],
   (
       SELECT CntId
       FROM IndexedViewOnParticipant part WITH (NOEXPAND)
       WHERE part.[ConferenceId] = conf.[Id]
   ) AS CalcConferenceParticipantCount
 FROM
   [Conference] conf
 ) confData ON confData.[RoomId] = rm.[Id]
GROUP BY
 rm.[Id]
OPTION (USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION'));

SQL Server 同意我的評估,即這是一個好主意,CPU 時間降至 78 ms

在我的機器上,我能夠使查詢更快,但這開始進入一些有風險的優化,因為它可能需要LOOP JOIN提示。當您的查詢或表中的數據發生更改時,該提示可能不是一個好主意。它也可能不適合您的硬體。這種方法背後的想法是創建一個合適的索引[Conference]並充分利用TOP僅執行嵌套循環的計劃。這是我添加的索引:

CREATE INDEX NCI_Conf ON [Conference] ([RoomId]) INCLUDE ([Name]);

使用提示執行與以前相同的查詢LOOP JOIN給了我以下計劃:

循環連接提示

該查詢只佔用了 58 毫秒的 CPU 時間。值得一提的是,我注意到在這個階段請求實際計劃會增加相當多的相對成本。我想到的所有其他可能的優化對於生產來說都是不安全的,所以我會在這裡停下來。

最後想一想,您真的要返回 1000 行任意行和最小會議名稱嗎?這些資訊對您的最終使用者有用嗎?

引用自:https://dba.stackexchange.com/questions/219575