我怎樣才能使這個嵌套查詢更有效?
我有 3 個表:
Room
、Conference
和Participant
。Room
有很多Conference
s,並且Conference
有很多Participant
s。我需要我的查詢來顯示來自 的欄位Room
,以及它具有的關聯數,以及每個關聯的 sConferences
數的總和。這是我為獲取此資訊而編寫的查詢的精簡版;首先,我剛剛選擇了房間 ID:Participant``Conference``SELECT
SELECT TOP(1000) rm.[Id] FROM [Room] rm LEFT JOIN ( SELECT conf.[Id] AS [ConferenceId], MIN(conf.[Name]) AS [ConferenceName], MIN(conf.[RoomId]) AS [RoomId], COUNT(part.[Id]) AS CalcConferenceParticipantCount FROM [Conference] conf LEFT JOIN [Participant] part on part.[ConferenceId] = conf.[Id] GROUP BY conf.[Id] ) confData ON confData.[RoomId] = rm.[Id] GROUP BY rm.[Id]
這非常快,因為 SQL Server 能夠從子查詢中提取數據
Room
並幾乎忽略子查詢(請參見下圖中的試用 1 - 試用 4)。然後我在ConferenceName
子查詢的欄位中添加了每個房間的會議數量:SELECT TOP(1000) rm.[Id], COUNT(confData.[ConferenceId]) AS CalcRoomConferenceCount, MIN(confData.[ConferenceName]) FROM [Room] rm LEFT JOIN ( SELECT conf.[Id] AS [ConferenceId], MIN(conf.[Name]) AS [ConferenceName], MIN(conf.[RoomId]) AS [RoomId], COUNT(part.[Id]) AS CalcConferenceParticipantCount FROM [Conference] conf LEFT JOIN [Participant] part on part.[ConferenceId] = conf.[Id] GROUP BY conf.[Id] ) confData ON confData.[RoomId] = rm.[Id] GROUP BY rm.[Id]
這使查詢速度大大降低了大約 100 倍(參見下圖中的試驗 5 - 試驗 7)。然後我從子查詢中添加了參與者計數,這意味著使用了 2 個級別的聚合函式:
SELECT TOP(1000) rm.[Id], COUNT(confData.[ConferenceId]) AS CalcRoomConferenceCount, MIN(confData.[ConferenceName]), SUM(confData.[CalcConferenceParticipantCount]) AS CalcRoomParticipantCount FROM [Room] rm LEFT JOIN ( SELECT conf.[Id] AS [ConferenceId], MIN(conf.[Name]) AS [ConferenceName], MIN(conf.[RoomId]) AS [RoomId], COUNT(part.[Id]) AS CalcConferenceParticipantCount FROM [Conference] conf LEFT JOIN [Participant] part on part.[ConferenceId] = conf.[Id] GROUP BY conf.[Id] ) confData ON confData.[RoomId] = rm.[Id] GROUP BY rm.[Id]
這進一步使查詢速度減慢了大約 4 倍(參見下圖中的試驗 8 - 試驗 10)。以下是包含 10 次試驗數據的客戶統計數據:
下面是慢查詢的查詢計劃:https ://www.brentozar.com/pastetheplan/?id=SJpyeec5Q
有沒有一種方法可以使這種查詢 - 我計運算元查詢聚合的聚合 - 更有效?
我通過查看表中的行數來模擬數據,給它們一個均勻的數據分佈,並對模式進行猜測:
DROP TABLE IF EXISTS [Room]; CREATE TABLE [Room] ( [Id] BIGINT NOT NULL, FILLER VARCHAR(200) NOT NULL, PRIMARY KEY ([Id]) ); INSERT INTO [Room] WITH (TABLOCK) SELECT TOP (3088) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), REPLICATE('Z', 200) FROM master..spt_values t1 CROSS JOIN master..spt_values t2 OPTION (MAXDOP 1); DROP TABLE IF EXISTS [Conference]; CREATE TABLE [Conference] ( [Id] BIGINT NOT NULL, [Name] VARCHAR(30) NOT NULL, [RoomId] BIGINT NOT NULL, FILLER VARCHAR(200) NOT NULL, PRIMARY KEY ([Id]) ); INSERT INTO [Conference] WITH (TABLOCK) SELECT RN , 'MY FAVORITE MEETING ROOM' , 1 + RN % 3088 , REPLICATE('Z', 200) FROM ( SELECT TOP (97413) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN FROM master..spt_values t1 CROSS JOIN master..spt_values t2 ) q OPTION (MAXDOP 1); DROP TABLE IF EXISTS [Participant]; CREATE TABLE [Participant] ( [Id] BIGINT NOT NULL, [ConferenceId] BIGINT NOT NULL, FILLER VARCHAR(200) NOT NULL, PRIMARY KEY ([Id]) ); INSERT INTO [Participant] WITH (TABLOCK) SELECT RN , 1 + RN % 97413 , REPLICATE('Z', 200) FROM ( SELECT TOP (235323) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN FROM master..spt_values t1 CROSS JOIN master..spt_values t2 ) q OPTION (MAXDOP 1); CREATE INDEX NCI_Part ON [Participant] ([ConferenceId]) INCLUDE (Id);
我對模式所做的最重要的假設是
Id
列是[Conference]
表的主鍵。考慮到查詢計劃和所涉及的索引名稱,這似乎是合理的。在我的機器上,我得到了和你一樣的查詢計劃,但是我的開始查詢只佔用了 163 毫秒的 CPU。我認為這些差異歸結為硬體、數據分佈的差異,以及我沒有將數據返回給客戶端這一事實。
我首先想到的是派生表中不必要
GROUP BY
的。是表的主鍵,因此您不需要所有聚合。使用正確的索引(對於這種特殊情況,您已經擁有),子查詢不一定是壞事。重寫你必須刪除的內容:confData``Id``GROUP BY
SELECT TOP(1000) rm.[Id], COUNT(confData.[ConferenceId]) AS CalcRoomConferenceCount, MIN(confData.[ConferenceName]), SUM(confData.[CalcConferenceParticipantCount]) AS CalcRoomParticipantCount FROM [Room] rm LEFT JOIN ( SELECT conf.[Id] AS [ConferenceId], conf.[Name] AS [ConferenceName], conf.[RoomId] AS [RoomId], ( SELECT COUNT(part.[Id]) FROM [Participant] part WHERE part.[ConferenceId] = conf.[Id] ) AS CalcConferenceParticipantCount FROM [Conference] conf ) confData ON confData.[RoomId] = rm.[Id] GROUP BY rm.[Id] OPTION (USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION'));
這導致流聚合被進一步下推到計劃中:
上傳的計劃佔用 113 毫秒的 CPU。存在相同的運算符,但其中一些處理的行更少,從而節省了時間。您可以通過在
[Conference]
with上定義覆蓋索引Id
作為索引鍵來提高此查詢的效率。這似乎是一件奇怪的事情,但您的聚集索引掃描佔用了總查詢時間的 10%,並且可能包括您不需要的列。如果您想讓查詢更快,您還可以考慮使用索引視圖。當您可以定義一個簡單的索引視圖來為您執行聚合時,為什麼每次都執行聚合?
CREATE VIEW IndexedViewOnParticipant WITH SCHEMABINDING AS SELECT [ConferenceId], COUNT_BIG([Id]) CntId, COUNT_BIG(*) Cnt FROM dbo.[Participant] GROUP BY [ConferenceId]; GO CREATE UNIQUE CLUSTERED INDEX CI ON IndexedViewOnParticipant ([ConferenceId]);
這將導致在表上進行 DML 時會產生更多的空間和一點點的成本。總的來說,我會說這是索引視圖的一個很好的案例。再次重寫查詢:
SELECT TOP(1000) rm.[Id], COUNT(confData.[ConferenceId]) AS CalcRoomConferenceCount, MIN(confData.[ConferenceName]), SUM(confData.[CalcConferenceParticipantCount]) AS CalcRoomParticipantCount FROM [Room] rm LEFT JOIN ( SELECT conf.[Id] AS [ConferenceId], conf.[Name] AS [ConferenceName], conf.[RoomId] AS [RoomId], ( SELECT CntId FROM IndexedViewOnParticipant part WITH (NOEXPAND) WHERE part.[ConferenceId] = conf.[Id] ) AS CalcConferenceParticipantCount FROM [Conference] conf ) confData ON confData.[RoomId] = rm.[Id] GROUP BY rm.[Id] OPTION (USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION'));
SQL Server 同意我的評估,即這是一個好主意,CPU 時間降至 78 ms。
在我的機器上,我能夠使查詢更快,但這開始進入一些有風險的優化,因為它可能需要
LOOP JOIN
提示。當您的查詢或表中的數據發生更改時,該提示可能不是一個好主意。它也可能不適合您的硬體。這種方法背後的想法是創建一個合適的索引[Conference]
並充分利用TOP
僅執行嵌套循環的計劃。這是我添加的索引:CREATE INDEX NCI_Conf ON [Conference] ([RoomId]) INCLUDE ([Name]);
使用提示執行與以前相同的查詢
LOOP JOIN
給了我以下計劃:該查詢只佔用了 58 毫秒的 CPU 時間。值得一提的是,我注意到在這個階段請求實際計劃會增加相當多的相對成本。我想到的所有其他可能的優化對於生產來說都是不安全的,所以我會在這裡停下來。
最後想一想,您真的要返回 1000 行任意行和最小會議名稱嗎?這些資訊對您的最終使用者有用嗎?