Performance

聚集索引掃描對性能不利嗎?

  • September 19, 2019

我有一個複雜的查詢(見下文),它在大型數據庫上表現不佳。我現在通過刪除連接來分析查詢,查看實際計劃並逐步添加連接。我看到聚集索引掃描具有高 I/O 成本和 100% 的 CPU 成本進行排序。這個查詢可以更優化嗎?

WITH currentPeriod AS
   (SELECT ROW_NUMBER() OVER (PARTITION BY tp.[StartDate]
     ORDER BY m.[Created] DESC) AS rowIndex
           , m.[Created], m.[MessageId]
           , tp.[StartDate], tp.[EndDate], tp.[IsCorrection]
      FROM [TimePeriods] tp
      JOIN [Messages] m on m.Id = tp.[MessageId]
    )
   SELECT [Created], [StartDate], [EndDate], [IsCorrection]
     FROM CurrentPeriod cur
    WHERE rowIndex = 1
    OPTION (RECOMPILE)

計劃(使用 ApexSQL): 實際執行計劃

我已經將 Message 表上的 PK 索引從 Clustered 更改為 Non-clustered,並在 Created 上添加了一個 Clustered 索引。

這是創建時沒有聚集索引的“原始”計劃: 原計劃

編輯

這是上述查詢計劃的連結:https
://www.brentozar.com/pastetheplan/?id=Hkw-4TxPB 這是完整查詢的計劃:https ://www.brentozar.com/pastetheplan/ ?id=SyYrDplPH

背景(可以跳過)

我不是 DBA,我只是最了解數據庫的人,並不意味著我對數據庫了解很多。

使用自定義上傳應用程序填充表格。上傳的文件是 XML 文件。每個文件都是一個消息,一個消息可以有多個 TimePeriod。消息還可以更正早期消息的 TimePeriods。我只對上次更新的時期感興趣。這就是我使用 ROW_NUMBER OVER 和 PARTITION BY 的原因。查詢的結果數據是正確的。這已經得到驗證。

完整的查詢:

WITH currentPeriod AS
   (SELECT ROW_NUMBER() OVER (PARTITION BY ce.[ReferenceCode], tp.[StartDate], e.[Id], ep.[ContractNumber]
       ORDER BY m.[Created] DESC, ep.[ContractNumber] DESC, IIF(fp.[LbTab] = N'010', 1, 0), fp.[DatAanv] DESC) AS rowIndex
       , ce.[ReferenceCode], ce.[Name] as [EntityName]
       , e.[SocialSecurityNumber], e.[EmployeeNumber], e.[Initials], e.[Firstname], e.[Prefix], e.[Surname], e.[BirthDate]
       , CAST((DATEDIFF(DAY, e.[BirthDate], ep.[HireDate]) / 365.24) as FLOAT) [AgeHired], e.[Gender], e.[PhoneNumber], e.[PhoneNumber2], e.[Email]
       , TRIM(CONCAT_WS(' ', a.[Street], COALESCE(a.[Number], COALESCE(a.[NumberString], '')), COALESCE(a.[NumberExtension], ''))) [StreetLine]
       , a.[ZipCode], a.[City], a.[CountryCode]
       , ep.[HireDate], ep.[DepartureDate]
       , f.[AantVerlU], f.[LnSv], f.[AantSV]
       , fp.[IndAvrLkvOudrWn], fp.[IndAvrLkvAgWn], fp.[IndAvrLkvDgBafSb], fp.[IndAvrLkvHpAgWn], fp.[DatAanv]
       , fp.[LbTab], fp.[IndWAO], fp.[IndWW], fp.[IndZW]
       , tp.[StartDate], tp.[EndDate], tp.[IsCorrection]
       FROM [TimePeriods] tp
       JOIN [Messages] m on m.Id = tp.[MessageId]
       JOIN [CorporateEntities] ce on m.[CorporateEntityId] = ce.Id
       JOIN [Financials] f on f.[TimePeriodId] = tp.Id
       JOIN [FinancialPeriods] fp on fp.[FinancialId] = f.Id
       JOIN [EmployementPeriods] ep on ep.Id = f.[EmployementPeriodId]
       JOIN [Employees] e on e.Id = ep.[EmployeeId]  
       JOIN [Addresses] a on a.Id = e.Id
       WHERE NOT EXISTS (SELECT 1 
                           FROM [dbo].[Withdrawals] w 
                           JOIN [dbo].[TimePeriods] tp2 ON tp2.Id = w.[TimePeriodId] 
                           JOIN [dbo].[Messages] m2 ON m2.id = tp2.MessageId AND m2.[CorporateEntityId] = m.[CorporateEntityId]
                           WHERE w.SofiNr is not null AND w.SofiNr = e.[SocialSecurityNumber] AND tp2.StartDate = tp.[StartDate] 
                           AND w.[NumIv] = ep.[ContractNumber] AND m2.[Created] > m.[Created] AND CONVERT(date, m.[Created]) <= CONVERT(date,@messageDate)) 
       AND CONVERT(date, tp.StartDate) >= CONVERT(date, @date26Param) 
       AND CONVERT(date, tp.EndDate) <= CONVERT(date, @endDate) AND CONVERT(date, m.[Created]) <= CONVERT(date, @messageDate)) 

       SELECT *
       FROM CurrentPeriod cur
      WHERE rowIndex = 1 AND [AantVerlU] > 0 AND [LbTab] != N'010'
        AND (SELECT count(1) 
               FROM CurrentPeriod cur2 
               WHERE cur2.[StartDate] > DATEADD(week, -26, cur.[StartDate]) AND cur2.[StartDate] < cur.[StartDate]
                   AND COALESCE(cur2.[SocialSecurityNumber],  LEFT(cur2.[ReferenceCode], 9) + '-' + cur2.[EmployeeNumber]) = COALESCE(cur.[SocialSecurityNumber],  LEFT(cur.[ReferenceCode], 9) + '-' + cur.Employeenumber)
                   AND cur2.[AantVerlU] > 0 AND cur2.[LbTab] != N'010') = 0
        AND StartDate >= @startDate 
   ORDER BY [SocialSecurityNumber], [StartDate];

優化器可能很難優化;將 CTE 部分假離線到 #Table 並查詢它。

問題:聚集索引掃描對性能不利?嗯,不一定。意味著您沒有查詢可以使用的更好(更具選擇性)的索引。對於較大的表也表現更差……但它可能更糟糕 - 堆(根本沒有數據順序)。

AND (SELECT 

   count(1) 
               FROM CurrentPeriod cur2 
               WHERE cur2.[StartDate] > DATEADD(week, -26, cur.[StartDate]) AND cur2.[StartDate] < cur.[StartDate]
                   AND COALESCE(cur2.[SocialSecurityNumber],  LEFT(cur2.[ReferenceCode], 9) + '-' + cur2.[EmployeeNumber]) = COALESCE(cur.[SocialSecurityNumber],  LEFT(cur.[ReferenceCode], 9) + '-' + cur.Employeenumber)
                   AND cur2.[AantVerlU] > 0 AND cur2.[LbTab] != N'010') = 0

您在此處的邏輯規定了必須計算日期是否適用的有效性的每一行……您可能還希望將連接條件的結果儲存在 #table 中

           AND COALESCE(cur2.[SocialSecurityNumber],  LEFT(cur2.[ReferenceCode], 9) + '-' + cur2.[EmployeeNumber]) = COALESCE(cur.[SocialSecurityNumber],  LEFT(cur.[ReferenceCode], 9) + '-' + cur.Employeenumber)

SQL 是面向集合的,在大量資訊中非常高效;但是在大量表格中以行級別為基礎的詳細資訊…… RBAR 地獄的咒語 - 痛苦的行!

引用自:https://dba.stackexchange.com/questions/249140