Sql-Server

Columnstore:糟糕的執行計劃 - 過濾而不是搜尋

  • December 3, 2021

請參閱https://www.brentozar.com/pastetheplan/?id=SyLQIPDtF (SQL 2016 Enterprise)上的執行計劃

  • 我有一個數據倉庫表 peak_reporting_data 跟踪每天和每小時的活動,每月包含大約 40 億行,其中包含由 date_key 分區的聚集 columnstoreindex(每天一個分區)
  • 在 peak_reporting_monats_peaks 表中,我匯總了該表並按月峰值對其進行排序/排名。有 3 種類型的活動 (kpi_type),對於每種類型,我最多有 24 小時 * 31 天 = 每月 744 行$$ monats_peak $$排名從 1 到 744。它具有超過 month_key、kpi_type、monats_peak 的唯一索引。
  • 對於最活躍的時間(每個 kpi_type),我需要更多詳細資訊,因此我編寫了以下查詢/視圖:
   SELECT prmp.month_key
        , prd.*
     FROM mba.peak_reporting_monats_peaks        AS prmp
     LEFT LOOP JOIN (SELECT prd.date_key
                          , prd.hour
                          , prd.kpi_type
                          , prd.is_dr_brand
                          , prd.type_id_usage
                          , prd.product_identifier
                          , SUM(prd.kb) / 1024.0 / 1024.0 AS gb
                          , SUM(CAST(prd.sek AS BIGINT))  AS sek
                          , SUM(prd.anzahl)               AS anzahl
                          , SUM(prd.kb) / 439453125.0     AS gbits
                       FROM db1.mba.peak_reporting_data AS prd 
                      GROUP BY prd.date_key
                             , prd.kpi_type
                             , prd.is_dr_brand
                             , prd.hour
                             , prd.type_id_usage
                             , prd.product_identifier
                     ) AS prd
       ON prd.date_key  = prmp.date_key
      AND prd.hour      = prmp.hour
    WHERE prmp.monats_peak = 1
      AND prmp.month_key = 202107

由於在 peak_reporting_monats_peaks 中每月恰好有 3 行 monats_peak = 1,因此 SQL 伺服器執行 3 個嵌套循環和查詢/根據 date_key、hour 和 kpi_type 聚合大表是合乎邏輯的(這可能會在 2 秒內完成,正如我用游標測試的那樣)。

但遺憾的是,它總是讀取整個表(目前為 360 億行),而 ColumnstoreIndexScan 運算符中沒有任何謂詞/查找謂詞,無論我嘗試什麼。由於這個原因,查詢需要 2-3 分鐘才能完成,而不是 2 秒。

如果我使用 INNER LOOP JOIN 而不是 LEFT LOOP JOIN,它將刪除過濾器運算符,但在 JOIN 之前添加一個惰性表假離線,通常的 INNER JOIN(沒有 LOOP)將導致 HASH JOIN(再次在整個表上)。

任何想法,如何(除了帶有游標的過程或多行表值函式)我可以“強制”SQL 伺服器進行 3 個簡單的查找(+ 聚合)而不是讀取大量不相關的數據?

統計數據等是正確的,它確切地知道 peak_reporting_monats_peaks 中將有 3 行,我在大 peak_reporting_data 中添加了 date_key + hour 的顯式統計

DDL:

USE tempdb
GO
CREATE SCHEMA [mba] AUTHORIZATION dbo
GO
CREATE PARTITION FUNCTION pf_mba_cdr (INT)
   AS RANGE RIGHT FOR VALUES (20201101 , 20201102 , 20201103 , 20201104 , 20201105 , 20201106 , 20201107 , 20201108 , 20201109 , 20201110 , 20201111 , 20201112 , 20201113 , 20201114 , 20201115 , 20201116 , 20201117 , 20201118 , 20201119 , 20201120 , 20201121 , 20201122 , 20201123 , 20201124 , 20201125 , 20201126 , 20201127 , 20201128 , 20201129 , 20201130 , 20201201 , 20201202 , 20201203 , 20201204 , 20201205 , 20201206 , 20201207 , 20201208 , 20201209 , 20201210 , 20201211 , 20201212 , 20201213 , 20201214 , 20201215 , 20201216 , 20201217 , 20201218 , 20201219 , 20201220 , 20201221 , 20201222 , 20201223 , 20201224 , 20201225 , 20201226 , 20201227 , 20201228 , 20201229 , 20201230 , 20201231 , 20210101 , 20210102 , 20210103 , 20210104 , 20210105 , 20210106 , 20210107 , 20210108 , 20210109 , 20210110 , 20210111 , 20210112 , 20210113 , 20210114 , 20210115 , 20210116 , 20210117 , 20210118 , 20210119 , 20210120 , 20210121 , 20210122 , 20210123 , 20210124 , 20210125 , 20210126 , 20210127 , 20210128 , 20210129 , 20210130 , 20210131 , 20210201 , 20210202 , 20210203 , 20210204 , 20210205 , 20210206 , 20210207 , 20210208 , 20210209 , 20210210 , 20210211 , 20210212 , 20210213 , 20210214 , 20210215 , 20210216 , 20210217 , 20210218 , 20210219 , 20210220 , 20210221 , 20210222 , 20210223 , 20210224 , 20210225 , 20210226 , 20210227 , 20210228 , 20210301 , 20210302 , 20210303 , 20210304 , 20210305 , 20210306 , 20210307 , 20210308 , 20210309 , 20210310 , 20210311 , 20210312 , 20210313 , 20210314 , 20210315 , 20210316 , 20210317 , 20210318 , 20210319 , 20210320 , 20210321 , 20210322 , 20210323 , 20210324 , 20210325 , 20210326 , 20210327 , 20210328 , 20210329 , 20210330 , 20210331 , 20210401 , 20210402 , 20210403 , 20210404 , 20210405 , 20210406 , 20210407 , 20210408 , 20210409 , 20210410 , 20210411 , 20210412 , 20210413 , 20210414 , 20210415 , 20210416 , 20210417 , 20210418 , 20210419 , 20210420 , 20210421 , 20210422 , 20210423 , 20210424 , 20210425 , 20210426 , 20210427 , 20210428 , 20210429 , 20210430 , 20210501 , 20210502 , 20210503 , 20210504 , 20210505 , 20210506 , 20210507 , 20210508 , 20210509 , 20210510 , 20210511 , 20210512 , 20210513 , 20210514 , 20210515 , 20210516 , 20210517 , 20210518 , 20210519 , 20210520 , 20210521 , 20210522 , 20210523 , 20210524 , 20210525 , 20210526 , 20210527 , 20210528 , 20210529 , 20210530 , 20210531 , 20210601 , 20210602 , 20210603 , 20210604 , 20210605 , 20210606 , 20210607 , 20210608 , 20210609 , 20210610 , 20210611 , 20210612 , 20210613 , 20210614 , 20210615 , 20210616 , 20210617 , 20210618 , 20210619 , 20210620 , 20210621 , 20210622 , 20210623 , 20210624 , 20210625 , 20210626 , 20210627 , 20210628 , 20210629 , 20210630 , 20210701 , 20210702 , 20210703 , 20210704 , 20210705 , 20210706 , 20210707 , 20210708 , 20210709 , 20210710 , 20210711 , 20210712 , 20210713 , 20210714 , 20210715 , 20210716 , 20210717 , 20210718 , 20210719 , 20210720 , 20210721 , 20210722 , 20210723 , 20210724 , 20210725 , 20210726 , 20210727 , 20210728 , 20210729 , 20210730 , 20210731 , 20210801 , 20210802 , 20210803 , 20210804 , 20210805 , 20210806 , 20210807 , 20210808 , 20210809 , 20210810 , 20210811 , 20210812 , 20210813 , 20210814 , 20210815 , 20210816 , 20210817 , 20210818 , 20210819 , 20210820 , 20210821 , 20210822 , 20210823 , 20210824 , 20210825 , 20210826 , 20210827 , 20210828 , 20210829 , 20210830 , 20210831 , 20210901 , 20210902 , 20210903 , 20210904 , 20210905 , 20210906 , 20210907 , 20210908 , 20210909 , 20210910 , 20210911 , 20210912 , 20210913 , 20210914 , 20210915 , 20210916 , 20210917 , 20210918 , 20210919 , 20210920 , 20210921 , 20210922 , 20210923 , 20210924 , 20210925 , 20210926 , 20210927 , 20210928 , 20210929 , 20210930 , 20211001 , 20211002 , 20211003 , 20211004 , 20211005 , 20211006 , 20211007 , 20211008 , 20211009 , 20211010 , 20211011 , 20211012 , 20211013 , 20211014 , 20211015 , 20211016 , 20211017 , 20211018 , 20211019 , 20211020 , 20211021 , 20211022 , 20211023 , 20211024 , 20211025 , 20211026 , 20211027 , 20211028 , 20211029 , 20211030 , 20211031 , 20211101 , 20211102 , 20211103 , 20211104 , 20211105 , 20211106 , 20211107 , 20211108 , 20211109 , 20211110 , 20211111 , 20211112 , 20211113 , 20211114 , 20211115 , 20211116 , 20211117 , 20211118 , 20211119 , 20211120 , 20211121 , 20211122 , 20211123 , 20211124 , 20211125 , 20211126 , 20211127 , 20211128 , 20211129 , 20211130 , 20211201 , 20211202 , 20211203 , 20211204 , 20211205 , 20211206 , 20211207 , 20211208 , 20211209 , 20211210 , 20211211 , 20211212 , 20211213 , 20211214 , 20211215 , 20211216 , 20211217 , 20211218 , 20211219 , 20211220 , 20211221 , 20211222 , 20211223 , 20211224 , 20211225 , 20211226 , 20211227 , 20211228 , 20211229 , 20211230 , 20211231 , 20220101 , 20220102 , 20220103 , 20220104 , 20220105 , 20220106 , 20220107 , 20220108 , 20220109 , 20220110 , 20220111 , 20220112 , 20220113 , 20220114 , 20220115 , 20220116 , 20220117 , 20220118 , 20220119 , 20220120 , 20220121 , 20220122 , 20220123 , 20220124 , 20220125 , 20220126 , 20220127 , 20220128 , 20220129 , 20220130 , 20220131 , 20220201 , 20220202 , 20220203 , 20220204 , 20220205 , 20220206 , 20220207 , 20220208 , 20220209 , 20220210 , 20220211 , 20220212 , 20220213 , 20220214 , 20220215 , 20220216 , 20220217 , 20220218 , 20220219 , 20220220 , 20220221 , 20220222 , 20220223 , 20220224 , 20220225 , 20220226 , 20220227 , 20220228 , 20220301 , 20220302 , 20220303 , 20220304 , 20220305 , 20220306 , 20220307 , 20220308 , 20220309 , 20220310 , 20220311 , 20220312 , 20220313 , 20220314 , 20220315 , 20220316 , 20220317 , 20220318 , 20220319 , 20220320 , 20220321 , 20220322 , 20220323 , 20220324 , 20220325 , 20220326 , 20220327 , 20220328 , 20220329 , 20220330 , 20220331 , 20220401 , 20220402 , 20220403 , 20220404 , 20220405 , 20220406 , 20220407 , 20220408 , 20220409 , 20220410 , 20220411 , 20220412 , 20220413 , 20220414 , 20220415 , 20220416 , 20220417 , 20220418 , 20220419 , 20220420 , 20220421 , 20220422 , 20220423 , 20220424 , 20220425 , 20220426 , 20220427 , 20220428 , 20220429 , 20220430 , 20220501 , 20220502 , 20220503 , 20220504 , 20220505 , 20220506 , 20220507 , 20220508 , 20220509 , 20220510 , 20220511 , 20220512 , 20220513 , 20220514 , 20220515 , 20220516 , 20220517 , 20220518 , 20220519 , 20220520 , 20220521 , 20220522 , 20220523 , 20220524 , 20220525 , 20220526 , 20220527 , 20220528 , 20220529 , 20220530 , 20220531 , 20220601 , 20220602 , 20220603 , 20220604 , 20220605 , 20220606 , 20220607 , 20220608 , 20220609 , 20220610 , 20220611 , 20220612 , 20220613 , 20220614 , 20220615 , 20220616 , 20220617 , 20220618 , 20220619 , 20220620 , 20220621 , 20220622 , 20220623 , 20220624 , 20220625 , 20220626 , 20220627 , 20220628 , 20220629 , 20220630 , 20220701 , 20220702 , 20220703 , 20220704 , 20220705 , 20220706 , 20220707 , 20220708 , 20220709 , 20220710 , 20220711 , 20220712 , 20220713 , 20220714 , 20220715 , 20220716 , 20220717 , 20220718 , 20220719 , 20220720 , 20220721 , 20220722 , 20220723 , 20220724 , 20220725 , 20220726 , 20220727 , 20220728 , 20220729 , 20220730 , 20220731 , 20220801 , 20220802 , 20220803 , 20220804 , 20220805 , 20220806 , 20220807 , 20220808 , 20220809 , 20220810 , 20220811 , 20220812 , 20220813 , 20220814 , 20220815 , 20220816 , 20220817 , 20220818 , 20220819 , 20220820 , 20220821 , 20220822 , 20220823 , 20220824 , 20220825 , 20220826 , 20220827 , 20220828 , 20220829 , 20220830 , 20220831 , 20220901 , 20220902 , 20220903 , 20220904 , 20220905 , 20220906 , 20220907 , 20220908 , 20220909 , 20220910 , 20220911 , 20220912 , 20220913 , 20220914 , 20220915 , 20220916 , 20220917 , 20220918 , 20220919 , 20220920 , 20220921 , 20220922 , 20220923 , 20220924 , 20220925 , 20220926 , 20220927 , 20220928 , 20220929 , 20220930 , 20221001 , 20221002 , 20221003 , 20221004 , 20221005 , 20221006 , 20221007 , 20221008 , 20221009 , 20221010 , 20221011 , 20221012 , 20221013 , 20221014 , 20221015 , 20221016 , 20221017 , 20221018 , 20221019 , 20221020 , 20221021 , 20221022 , 20221023 , 20221024 , 20221025 , 20221026 , 20221027 , 20221028 , 20221029 , 20221030 , 20221031 , 20221101 , 20221102 , 20221103 , 20221104 , 20221105 , 20221106 , 20221107 , 20221108 , 20221109 , 20221110 , 20221111 , 20221112 , 20221113 , 20221114 , 20221115 , 20221116 , 20221117 , 20221118 , 20221119 , 20221120 , 20221121 , 20221122 , 20221123 , 20221124 , 20221125 , 20221126 , 20221127 , 20221128 , 20221129 , 20221130 , 20221201 , 20221202 , 20221203 , 20221204 , 20221205 , 20221206 , 20221207 , 20221208 , 20221209 , 20221210 , 20221211 , 20221212 , 20221213 , 20221214 , 20221215 , 20221216 , 20221217 , 20221218 , 20221219 , 20221220 , 20221221 , 20221222 , 20221223 , 20221224 , 20221225 , 20221226 , 20221227 , 20221228 , 20221229 , 20221230 , 20221231);
GO
CREATE PARTITION SCHEME ps_mba_cdr AS PARTITION pf_mba_cdr ALL TO ([PRIMARY]);
GO


CREATE TABLE mba.peak_reporting_data
    (date_key            INT            NOT NULL
   , hour                TINYINT        NOT NULL
   , kb                  DECIMAL(19, 6) NULL
   , msisdn_key          INT            NOT NULL
   , sp_account          SMALLINT       NOT NULL
   , is_dr_brand         BIT            NULL
   , type_id_usage       BIGINT         NOT NULL
   , product_identifier  BIGINT         NOT NULL
   , kpi_service         VARCHAR(15)    NULL
   , kpi_group           VARCHAR(15)    NULL
   , sek                 INT            NULL
   , anzahl              INT            NULL
   , kpi_type            CHAR(1)        NOT NULL
   , anzahl_begonnen     INT            NULL
   , anzahl_geendet      INT            NULL
   , anzahl_durchgaengig INT            NULL
   , anzahl_nur_in_hour  INT            NULL)
   ON ps_mba_cdr(date_key);
GO

ALTER TABLE mba.peak_reporting_data SET (LOCK_ESCALATION = AUTO);
GO
CREATE CLUSTERED COLUMNSTORE INDEX icc_peak_reporting_data ON mba.peak_reporting_data ON ps_mba_cdr(date_key);
GO


CREATE TABLE mba.peak_reporting_monats_peaks
    (month_key           INT            NOT NULL
   , date_key            INT            NOT NULL
   , week_day            VARCHAR(30)    NOT NULL
   , hour                TINYINT        NOT NULL
   , kpi_type            CHAR(1)        NOT NULL
   , gb                  DECIMAL(38, 6) NULL
   , sek                 BIGINT         NULL
   , anzahl              INT            NOT NULL
   , gigabit_pro_sekunde DECIMAL(38, 6) NULL
   , prozent_dr_brand    DECIMAL(9, 6)  NOT NULL
   , tages_peak          TINYINT        NOT NULL
   , monats_peak         SMALLINT       NOT NULL
   , refresh_date        DATETIME2(0)   NOT NULL) ON [PRIMARY];
GO

CREATE UNIQUE CLUSTERED INDEX iuc_peak_reporting_monats_peaks__month_key__kpi_type__monats_peak
   ON mba.peak_reporting_monats_peaks (month_key, kpi_type, monats_peak)
   WITH (DROP_EXISTING = OFF, FILLFACTOR = 98, DATA_COMPRESSION = ROW, SORT_IN_TEMPDB = ON
       , STATISTICS_INCREMENTAL = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
   ON [PRIMARY];
GO
CREATE NONCLUSTERED INDEX inc_peak_reporting_monats_peaks__date_key
   ON mba.peak_reporting_monats_peaks (date_key, kpi_type, tages_peak)
   WITH (DROP_EXISTING = OFF, FILLFACTOR = 99, DATA_COMPRESSION = PAGE, SORT_IN_TEMPDB = ON
       , STATISTICS_INCREMENTAL = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
   ON [PRIMARY];
GO

我添加了 DDL 並使用通常的 INNER JOIN 沒有任何提示,這需要 2 分鐘,因為它在掃描後添加了過濾器,但這次它做了一個嵌套循環。

如果我強制使用 HASH JOIN,我現在會得到一個新計劃: https ://www.brentozar.com/pastetheplan/?id=HyLDWiPFt該計劃不再有過濾器運算符,但只執行 11 秒,因為它會消除段(只有 44 43k 讀取)。它還說該表已分區,但它使用了 0 個分區(錯誤,否則我將沒有結果)。懸而未決的問題仍然是:為什麼將它用作嵌套循環的過濾器。

我知道您無法獲得 Columnstore Index Seek,但如果表是分區的,它至少可以將 SEEK Predicate(通常是分區)添加到 Columnstore Index Scan 運算符。

您幾乎永遠不需要在嵌套循環連接的內側進行列儲存掃描。

在這種情況下,引擎不支持批處理模式(批處理模式列儲存掃描無法倒帶)。請注意,您上傳的計劃顯示列儲存掃描以行模式執行

單獨的過濾器並不是特別有趣。不可分割的謂詞不能總是下推到子掃描或搜尋。在這種情況下,引擎不會將動態分區消除與剩余謂詞結合起來。這是一個低效率,但不是這裡的主要問題。

失去連接提示,讓優化器選擇它想要的計劃。您可能會得到類似於您上傳的雜湊提示計劃的內容,該計劃在 1600 毫秒內執行。是的,整個列儲存都被掃描了,但是在散列連接處創建的點陣圖非常有效——將 35B 行減少到 37M。整個過程在1.5s內完成,一點也不差。請注意,批處理模式點陣圖允許行組級消除(包括預讀)和其他技巧,因此您最終不會讀取 35B 行。

順便說一句,您原來的嵌套循環計劃確實包括分區消除:

掃描屬性


如果您真的想採用分區消除循環式策略 - 而且這樣做可能很值得 - 您需要做一些額外的工作才能在嵌套的內側獲得有效的批處理模式列儲存掃描循環加入。

正如我所說,這種安排不可能自然而然地得到。您需要在單獨的執行範圍中“隱藏”內部操作,以便為重複的列儲存掃描獲取(可能是並行的)批處理模式執行。

這可以通過以下方式實現:

  1. 將左連接重寫為應用
  2. 將應用端放在表值(不是內聯!)函式中

左連接可以很容易地轉換為應用。相關參數將是date_key[hour]。然後,您將使用APPLYpeak_reporting_monats_peaks呼叫每一行的函式。

如果操作正確,您將獲得分區消除、並行性和批處理模式列儲存掃描。


提供的程式碼中的快速範例:

國際貨幣基金組織:

CREATE FUNCTION dbo.F
(
   @date_key integer,
   @hour tinyint
)
RETURNS @T table
(
   kpi_type char(1) NOT NULL,
   is_dr_brand bit NULL,
   type_id_usage bigint NOT NULL,
   product_identifier bigint NOT NULL,
   gb decimal(19,6) NOT NULL,
   sek bigint NOT NULL,
   anzahl integer NOT NULL,
   gbits decimal (19,6) NOT NULL
)
WITH SCHEMABINDING
AS
BEGIN
   INSERT @T
   SELECT 
       prd.kpi_type,
       prd.is_dr_brand,
       prd.type_id_usage,
       prd.product_identifier,
       gb = SUM(prd.kb) / 1024.0 / 1024.0,
       sek = SUM(CAST(prd.sek AS BIGINT)),
       anzahl = SUM(prd.anzahl),
       gbits = SUM(prd.kb) / 439453125.0
   FROM mba.peak_reporting_data AS prd
   WHERE
       prd.date_key = @date_key
       AND prd.[hour] = @hour
   GROUP BY
       prd.kpi_type,
       prd.is_dr_brand,
       prd.type_id_usage,
       prd.product_identifier;

   RETURN;
END;

詢問:

SELECT
   PRMP.month_key,
   PRMP.date_key,
   [PRMP].[hour],
   F.kpi_type,
   F.is_dr_brand,
   F.type_id_usage,
   F.product_identifier,
   F.gb,
   F.sek,
   F.anzahl,
   F.gbits
FROM mba.peak_reporting_monats_peaks AS PRMP
OUTER APPLY dbo.F(PRMP.date_key, PRMP.[hour]) AS F
WHERE
   PRMP.monats_peak = 1
   AND PRMP.month_key = 202107;

計劃:

計劃

TVF 掃描屬性(批處理模式、分區消除)

特性

引用自:https://dba.stackexchange.com/questions/303459