Sql-Server

可以做些什麼來進一步提高多連接和聚合查詢的性能?

  • November 6, 2013

我在這裡模擬了一個典型的星型模式,我提到了兩個查詢:第一個查詢只是將事實表與 2 個維度表和 1 個日曆表連接起來,第二個查詢連接和聚合。

我通過研究執行計劃和一些通過閱讀建議的索引進行了實驗並創建了索引,所有這些都在一定程度上提高了性能。

我的問題是在這種情況下可以進一步做些什麼,可以應用哪些索引或者如何修改查詢以獲得更好的性能並減少執行時間?

所以首先要創建和填充表並創建索引的查詢:

CREATE TABLE FactTable (id BIGINT IDENTITY PRIMARY KEY, FKDim1 BIGINT NOT NULL, FKDim2 BIGINT, DateRef DATETIME, Fact1 MONEY, Fact2 MONEY)
CREATE TABLE Dim1Table (id BIGINT IDENTITY PRIMARY KEY, Dim1Name NVARCHAR(20), Dim1Val1 MONEY, Dim1Val2 MONEY)
CREATE TABLE Dim2Table (id BIGINT IDENTITY PRIMARY KEY, Dim2Name NVARCHAR(20), Dim2Val1 MONEY, Dim2Val2 MONEY)
CREATE TABLE CalendarTable (id BIGINT IDENTITY PRIMARY KEY, [Date] DATETIME UNIQUE NONCLUSTERED, [Weekday] NVARCHAR(10), [Month] NVARCHAR(10))

ALTER TABLE FactTable ADD CONSTRAINT FK_Dim1 FOREIGN KEY (FKDim1 ) REFERENCES Dim1Table(ID);
ALTER TABLE FactTable ADD CONSTRAINT FK_Dim2 FOREIGN KEY (FKDim2 ) REFERENCES Dim1Table(ID);
ALTER TABLE FactTable ADD CONSTRAINT FK_Calendar FOREIGN KEY (DateRef) REFERENCES CalendarTable([Date]);

DECLARE @counter INT;
SET @counter = 1;

WHILE @counter < 10000
BEGIN
INSERT INTO Dim1Table(Dim1Name,Dim1Val1,Dim1Val2)VALUES('Dim1-'+CAST((@counter % 100) AS NVARCHAR),RAND() * 10000,RAND() * 20000);
INSERT INTO Dim2Table(Dim2Name,Dim2Val1,Dim2Val2)VALUES('Dim2-'+CAST(@counter AS NVARCHAR),RAND() * 10000,RAND() * 20000);
SET @counter = @counter + 1;
END

DECLARE @StartDate DATETIME
DECLARE @EndDate DATETIME
SET @StartDate = CAST('1/1/1995' AS DATETIME)
SET @EndDate = DATEADD(d, 3650, @StartDate)

WHILE @StartDate <= @EndDate
BEGIN
INSERT INTO CalendarTable([Date],[Weekday],[Month])SELECT @StartDate, DATENAME(dw, @StartDate), DATENAME(MONTH, @StartDate)
SET @StartDate = DATEADD(dd, 1, @StartDate)
END

SET @counter = 1;
WHILE @counter < 500000
BEGIN
INSERT INTO FactTable
(FKDim1,FKDim2,DateRef,Fact1,Fact2)VALUES(@counter % 10000,@counter % 10000, DATEADD(dd, @counter % 3650, CAST('1/1/1995' AS DATETIME)), RAND() * 10000, RAND() * 20000)
SET @counter = @counter + 1
END

創建索引的程式碼:

CREATE NONCLUSTERED INDEX [Dim1TableIndex1] ON [dbo].[Dim1Table]([Dim1Name] ASC)INCLUDE([id], [Dim1Val1], [Dim1Val2]);
CREATE NONCLUSTERED INDEX [Dim1TableIndex2] ON [dbo].[Dim2Table]([Dim2Name] ASC)INCLUDE([id], [Dim2Val1], [Dim2Val2]);
CREATE NONCLUSTERED INDEX [FactTableIndex1] ON [dbo].FactTable(FKDim1 ASC)INCLUDE(FKDim2, DateRef, Fact1, Fact2);
CREATE NONCLUSTERED INDEX [FactTableIndex2] ON [dbo].FactTable(FKDim2 ASC)INCLUDE(FKDim1, DateRef, Fact1, Fact2);
CREATE UNIQUE NONCLUSTERED INDEX [CalnedarIndex1] ON [dbo].[CalendarTable]([Date] ASC)INCLUDE ([id],[Weekday],[Month]);

查詢 1:Fact 表與 Calendar 和 Dimension 表的簡單連接,以及 where 子句:

SELECT D1.Dim1Name,
      D2.Dim2Name,
      C.[Date],
      C.[Weekday],
      C.[Month],
      D1.Dim1Val1,
      D2.Dim2Val2,
      F.Fact1,
      F.Fact2
FROM   FactTable F
      JOIN Dim1Table D1
           ON  D1.id = F.FKDim1
      JOIN Dim2Table D2
           ON  D2.id = F.FKDim2
      JOIN CalendarTable C
           ON  F.DateRef = C.Date

關閉索引的執行細節(上面提到的所有 5 個)

   (15000 row(s) affected)
Table 'CalendarTable'. Scan count 9, logical reads 82, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Dim2Table'. Scan count 9, logical reads 205, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Dim1Table'. Scan count 9, logical reads 190, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'FactTable'. Scan count 9, logical reads 3890, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:
  CPU time = 159 ms,  elapsed time = 475 ms.

和執行計劃: 在此處輸入圖像描述

啟用索引:

(15000 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'FactTable'. Scan count 300, logical reads 1083, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Dim1Table'. Scan count 3, logical reads 11, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CalendarTable'. Scan count 1, logical reads 27, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Dim2Table'. Scan count 1, logical reads 67, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:
  CPU time = 125 ms,  elapsed time = 389 ms.

及執行計劃: 在此處輸入圖像描述

第二個查詢,在加入後聚合:

SELECT D1.Dim1Name,
      C.[Month],
      Sum(D1.Dim1Val1) SumDim1Val1,
      Sum(D2.Dim2Val2) SumDim2Val2,
      Sum(F.Fact1) SumFact1,
      Avg(F.Fact2) Fact2Avg
FROM   FactTable F
      JOIN Dim1Table D1
           ON  D1.id = F.FKDim1
      JOIN Dim2Table D2
           ON  D2.id = F.FKDim2
      JOIN CalendarTable C
           ON  F.DateRef = C.Date
GROUP BY D1.Dim1Name, C.[MONTH]

關閉所有索引的性能:

(1200 row(s) affected)
Table 'Dim1Table'. Scan count 9, logical reads 190, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CalendarTable'. Scan count 9, logical reads 82, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Dim2Table'. Scan count 9, logical reads 205, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'FactTable'. Scan count 9, logical reads 3890, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:
  CPU time = 2436 ms,  elapsed time = 554 ms.

及執行計劃: 在此處輸入圖像描述

並啟用索引:

(1200 row(s) affected)
Table 'Dim1Table'. Scan count 9, logical reads 181, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CalendarTable'. Scan count 9, logical reads 76, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Dim2Table'. Scan count 9, logical reads 196, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'FactTable'. Scan count 9, logical reads 3710, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:
  CPU time = 2060 ms,  elapsed time = 518 ms.

最後是執行計劃: 在此處輸入圖像描述

我得到的改進不是很顯著,但是當我考慮大量行時,例如從查詢 1 中刪除 where 子句,然后索引將執行時間從大約 9.5 秒減少到 8.3 秒。

我將在這裡重申我的問題:

  1. 如何重新設計索引或添加新索引以提高性能?
  2. 如何通過重新設計查詢來提高性能?
  3. 除了索引和重新設計查詢之外還能做什麼?

我已經提供了簡單的範例,但試圖涵蓋星型模式中的一些典型場景和查詢類型,這些特定問題的答案背後的概念也將普遍適用。並使用 SQL Server 2012。

嘗試使用包含列的非聚集索引對星型模式查詢進行微優化幾乎沒有任何需要、要點或好處。事實表是為掃描而建構的。

您在範例中創建的索引是父表的子集副本,正在被掃描(無搜尋)。較小的性能改進來自掃描的頁面比父表少得多。鑑於星型模式是為支持即席查詢模式而建構的,因此創建索引來支持所有可能的查詢是不可行的。

  • 在日期鍵上創建事實表聚集索引。大多數(典型)事實表查詢都包含時間元素,並且對日期鍵的分群可以對事實表行進行範圍掃描。
  • 在事實表的外鍵上添加非聚集索引,以幫助進行高度選擇性的查詢。可以使用 NOCHECK 創建維度表的外鍵,以防止對 ETL 產生任何影響。
  • 將維度表聚集在它們的代理鍵上。
  • 在每個維度表的自然鍵上創建一個非聚集索引。
  • 停止。

優化器檢測星型模式查詢模式並製定有效處理它們的策略,利用標準版中的掃描和散列連接或企業版中的點陣圖過濾。遵循上面概述的索引策略,讓優化器處理其餘的。

引用自:https://dba.stackexchange.com/questions/52845