內連接的基數估計問題

June 8, 2018

我很難理解為什麼行估計是非常錯誤的，這是我的情況：

簡單連接 - 使用 SQL Server 2016 sp2（sp1 上的相同問題），dbcompatiblity=130。

select Amount_TransactionCurrency_id, CurrencyShareds.id 
from CurrencyShareds 
   INNER JOIN annexes ON Amount_TransactionCurrency_id = CurrencyShareds.Id 
option (QUERYTRACEON 3604, QUERYTRACEON 2363);

SQL 估計 1 行，而它是 107131 並選擇執行嵌套循環（連結到計劃）。在 CurrencyShareds 上更新統計資訊後，估計就可以了，並且選擇了合併連接（連結到新計劃）。只要將一條記錄添加到 CurrencyShareds，統計資訊就會變得“陳舊”並且 sql 會返回錯誤的估計。

我不會太擔心這個簡單的查詢，但這只是更大查詢的一部分，這是多米諾骨牌的開始……

為什麼在 100 條記錄表中添加一行會導致這樣的損壞？在查看基數估計跟踪的輸出時，我看到了這個警告***WARNING: badly-formed histogram ***，但我找不到關於這個主題的更多資訊。

以下是基數估計的完整輸出：

Begin selectivity computation
Input tree:

LogOp_Join

CStCollBaseTable(ID=1, CARD=107131 TBL: annexes)

CStCollBaseTable(ID=2, CARD=100 TBL: CurrencyShareds)

ScaOp_Comp x_cmpEq

ScaOp_Identifier QCOL: [test.MasterData].[dbo].[CurrencyShareds].Id

ScaOp_Identifier QCOL: [test.MasterData].[dbo].[Annexes].Amount_TransactionCurrency_id

Plan for computation:

CSelCalcExpressionComparedToExpression( QCOL: [test.MasterData].[dbo].[Annexes].Amount_TransactionCurrency_id x_cmpEq QCOL: [test.MasterData].[dbo].[CurrencyShareds].Id )

Loaded histogram for column QCOL: [test.MasterData].[dbo].[Annexes].Amount_TransactionCurrency_id from stats with id 7

Loaded histogram for column QCOL: [test.MasterData].[dbo].[CurrencyShareds].Id from stats with id 1 *** WARNING: badly-formed histogram ***

Selectivity: 4.59503e-018

Stats collection generated:

CStCollJoin(ID=3, CARD=1 x_jtInner)

CStCollBaseTable(ID=1, CARD=107131 TBL: annexes)

CStCollBaseTable(ID=2, CARD=100 TBL: CurrencyShareds)

End selectivity computation

Estimating distinct count in utility function

Input stats collection:

CStCollBaseTable(ID=1, CARD=107131 TBL: annexes)

Columns to distinct on:QCOL: [test.MasterData].[dbo].[Annexes].Amount_TransactionCurrency_id

Plan for computation:

CDVCPlanLeaf

0 Multi-Column Stats, 1 Single-Column Stats, 0 Guesses

Covering multi-col stats id: 7

Using ambient cardinality 107131 to combine distinct counts:

5

Combined distinct count: 5

Result of computation: 5

Estimating distinct count in utility function

Input stats collection:

CStCollBaseTable(ID=2, CARD=100 TBL: CurrencyShareds)

Columns to distinct on:QCOL: [test.MasterData].[dbo].[CurrencyShareds].Id

Plan for computation:

CDVCPlanUniqueKey

Result of computation: 100

當我更新 CurrencyShareds 的統計數據時，“直方圖格式錯誤”的部分發生了變化，並且基數計算正確

Plan for computation:

CSelCalcExpressionComparedToExpression( QCOL: [test.MasterData].[dbo].[Annexes].Amount_TransactionCurrency_id x_cmpEq QCOL: [test.MasterData].[dbo].[CurrencyShareds].Id )

Loaded histogram for column QCOL: [test.MasterData].[dbo].[Annexes].Amount_TransactionCurrency_id from stats with id 7

Loaded histogram for column QCOL: [test.MasterData].[dbo].[CurrencyShareds].Id from stats with id 1

Selectivity: 0.01

Stats collection generated:

CStCollJoin(ID=3, CARD=107131 x_jtInner)

CStCollBaseTable(ID=1, CARD=107131 TBL: annexes)

CStCollBaseTable(ID=2, CARD=100 TBL: CurrencyShareds)

End selectivity computation

並為此提供統計資訊“

$$ CurrencyShareds $$.Id from stats with id 1" 帶有關於直方圖的警告，這對我來說看起來很好……

Name                                                                                                                             Updated              Rows                 Rows Sampled         Steps  Density       Average key length String Index Filter Expression                                                                                                                                                                                                                                                Unfiltered Rows      Persisted Sample Percent
-------------------------------------------------------------------------------------------------------------------------------- -------------------- -------------------- -------------------- ------ ------------- ------------------ ------------ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------- ------------------------
PK_CurrencyShareds_Id                                                                                                            May 23 2018 10:43PM  98                   98                   75     1             8                  NO           NULL                                                                                                                                                                                                                                                             98                   0

(1 row affected)

All density   Average Length Columns
------------- -------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0,01020408    8              Id

(1 row affected)

RANGE_HI_KEY         RANGE_ROWS    EQ_ROWS       DISTINCT_RANGE_ROWS  AVG_RANGE_ROWS
-------------------- ------------- ------------- -------------------- --------------
119762190797406464   0             1             0                    1
119762190797406466   1             1             1                    1
119762190797406468   1             1             1                    1
119762190797406470   1             1             1                    1
119762190797406472   1             1             1                    1
119762190797406474   1             1             1                    1
119762190797406476   1             1             1                    1
119762190797406478   1             1             1                    1
119762190797406480   1             1             1                    1
119762190797406482   1             1             1                    1
119762190797406484   1             1             1                    1
119762190797406486   1             1             1                    1
119762190797406488   1             1             1                    1
119762190797406490   1             1             1                    1
119762190797406492   1             1             1                    1
119762190797406494   1             1             1                    1
119762190797406496   1             1             1                    1
119762190797406498   1             1             1                    1
119762190797406500   1             1             1                    1
119762190797406502   1             1             1                    1
119762190797406504   1             1             1                    1
119762190797406506   1             1             1                    1
119762190797406507   0             1             0                    1
478531702587687680   0             1             0                    1
478531702591881728   0             1             0                    1
478531702591881729   0             1             0                    1
478531702591881984   0             1             0                    1
478531702591881985   0             1             0                    1
478531702596076032   0             1             0                    1
478531702596076033   0             1             0                    1
478531702596076288   0             1             0                    1
478531702600270336   0             1             0                    1
478531702600270592   0             1             0                    1
478532235583062528   0             1             0                    1
478532235583062784   0             1             0                    1
478532235587256832   0             1             0                    1
530792464911467264   0             1             0                    1
530792464924049920   0             1             0                    1
530792464924050176   0             1             0                    1
530792464928244224   0             1             0                    1
530792464928244480   0             1             0                    1
530792464932438528   0             1             0                    1
530792464932438784   0             1             0                    1
530792464936632832   0             1             0                    1
530792464936632833   0             1             0                    1
530792464936633088   0             1             0                    1
530792464940827136   0             1             0                    1
530792464940827392   0             1             0                    1
530792464949216000   2             1             2                    1
530792464953410048   0             1             0                    1
530792464953410304   0             1             0                    1
530792464957604352   0             1             0                    1
530792464957604353   0             1             0                    1
530792464957604608   0             1             0                    1
530792464961798656   0             1             0                    1
530792464961798912   0             1             0                    1
530792464965992960   0             1             0                    1
530792464965993216   0             1             0                    1
530792464965993217   0             1             0                    1
530792464970187264   0             1             0                    1
530792464970187265   0             1             0                    1
530792464970187520   0             1             0                    1
530792464974381568   0             1             0                    1
530792464974381824   0             1             0                    1
530792464974381825   0             1             0                    1
530792464978575872   0             1             0                    1
530792464978575873   0             1             0                    1
530792464978576128   0             1             0                    1
867420708903354880   0             1             0                    1
867420708903355136   0             1             0                    1
867420708903355137   0             1             0                    1
960876568220042240   0             1             0                    1
976385263448130048   0             1             0                    1
977302121709864192   0             1             0                    1
977955748426318592   0             1             0                    1

和第二個索引的資訊：

Name                                                                                                                             Updated              Rows                 Rows Sampled         Steps  Density       Average key length String Index Filter Expression                                                                                                                                                                                                                                                Unfiltered Rows      Persisted Sample Percent
-------------------------------------------------------------------------------------------------------------------------------- -------------------- -------------------- -------------------- ------ ------------- ------------------ ------------ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------- ------------------------
IX_FK_Amount_TransactionCurrency                                                                                                 May 21 2018  3:29PM  107204               107204               5      0             16                 NO           NULL                                                                                                                                                                                                                                                             107204               0

(1 row affected)

All density   Average Length Columns
------------- -------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0,2           8              Amount_TransactionCurrency_id
9,32801E-06   16             Amount_TransactionCurrency_id, Id

(2 rows affected)

RANGE_HI_KEY         RANGE_ROWS    EQ_ROWS       DISTINCT_RANGE_ROWS  AVG_RANGE_ROWS
-------------------- ------------- ------------- -------------------- --------------
119762190797406475   0             160           0                    1
119762190797406478   0             867           0                    1
119762190797406481   0             106           0                    1
119762190797406494   0             105742        0                    1
119762190797406496   0             329           0                    1

好的，我希望我現在明白了 - 所以這是我們的案例
給定
具有約 100 行的參考表 (CurrencyShareds)，但 id 很大，並且最小值、最大值差異很大 - 最小值：119,762,190,797,406,464 與最大值：977,955,748,426,318,592
一個表（附件）對 CurrencyShared 具有簡單的 FK，但只使用了很少的貨幣 - 您可以看到 IX_FK_Amount_TransactionCurrency 的直方圖列出了 5 個 id - 重要的是只有那些“低” id，因為其他沒有使用。
當所有統計數據都是最新的
CSelCalcExpressionComparedToExpression( QCOL: [test.MasterData].[dbo].[Annexes].Amount_TransactionCurrency_id x_cmpEq QCOL: [test.MasterData].[dbo].[CurrencyShareds].Id )

Loaded histogram for column QCOL: [test.MasterData].[dbo].[Annexes].Amount_TransactionCurrency_id from stats with id 7

Loaded histogram for column QCOL: [test.MasterData].[dbo].[CurrencyShareds].Id from stats with id 1

Selectivity: 0.01
然後為連接計算的選擇性很好，為 100 * 107,131 * 0.01 = 107,131
如果貨幣共享的統計數據不是最新的，那麼
CSelCalcExpressionComparedToExpression( QCOL: [test.MasterData].[dbo].[Annexes].Amount_TransactionCurrency_id x_cmpEq QCOL: [test.MasterData].[dbo].[CurrencyShareds].Id )

Loaded histogram for column QCOL: [test.MasterData].[dbo].[Annexes].Amount_TransactionCurrency_id from stats with id 7

Loaded histogram for column QCOL: [test.MasterData].[dbo].[CurrencyShareds].Id from stats with id 1 *** WARNING: badly-formed histogram ***

Selectivity: 4.59503e-018
選擇性急劇下降，因此連接的估計行數為 1。
當直方圖發生變化時
在我將一行添加到引用具有高 id 的 CurrencyShared 的附件後，結果 IX_FK_Amount_TransactionCurrency 的直方圖更改為
RANGE_HI_KEY         RANGE_ROWS    EQ_ROWS       DISTINCT_RANGE_ROWS  AVG_RANGE_ROWS
-------------------- ------------- ------------- -------------------- --------------
119762190797406475   0             173           0                    1
119762190797406478   0             868           0                    1
119762190797406481   0             107           0                    1
119762190797406494   0             105745        0                    1
119762190797406496   0             330           0                    1
119762190797406618   0             1             0                    1
119762190797406628   0             1             0                    1
977955748426318623   0             1             0                    1
有了這個直方圖，問題就消失了，現在向 currencyshareds 添加新行不會導致基數估計急劇下降。
這是為什麼？
我懷疑這就是粗略直方圖估計算法在 sql2014+ 中的工作方式，我的猜測基於這篇很棒的文章https://www.sqlshack.com/join-estimation-internals/
粗略直方圖估計是一種新算法，文件較少，即使在一般概念方面也是如此。眾所周知，它不是逐步對齊直方圖，而是僅將它們與最小和最大直方圖邊界對齊。這種方法可能會引入較少的 CE 錯誤（但並非總是如此，因為我們記得這只是一個模型）。
只是為了讓一切清楚 - 為什麼我們在貨幣共享中有如此奇怪的 ID？
這很簡單——我們的 id 是全球唯一的，並且部分基於時間戳（基於雪花的實現）。最常見的貨幣是幾年前在應用程序開始時添加的，只有少數真正用於生產，這就是為什麼在直方圖中只有那些具有“低”id 的貨幣。
問題出現在我們的測試環境中，一些自動化測試開始添加測試貨幣，導致一些查詢執行時間更長或超時……
如何解決問題？
我們將更頻繁地更新這些參考表的統計資訊（我們可能對其他類似的參考數據表也有類似的問題）——這些表很小，所以更新統計資訊不是問題
得到教訓
最新的統計數據很重要！！！
普通的舊標識列不會導致這些問題:)

根據您的直方圖，我能夠在 2017 CU6中重現該問題。我不會說你做錯了什麼。相反，基數估計出了點問題。這是我在插入一行之前得到的：
插入一行後，最終的基數估計會下降很多：
您在這裡有一個非常簡單的重現，因此我的建議是送出產品回饋或向 Microsoft 開具支持票。我能夠找到一些適用於您的範例數據的解決方法，其中一種可能對您來說是可以接受的。
將唯一索引放在CurrencyShareds.Id. 如果沒有唯一索引，我無法讓重現工作。桌子很小，所以也許你可以不用索引。當然，您可能有很好的理由保留它。
將連接的結果具體化到臨時表中。根據您的問題，重要的是在此步驟中獲得合理的估計，以便較大的查詢執行良好。臨時表是實現這一目標的一種方法。
使用舊版 CE。我無法用它重現問題。當然，這可能會對您的其餘查詢產生負面影響。
用愚蠢的程式碼欺騙查詢優化器。例如，在我的測試中，以下重寫效果很好：
.
select Amount_TransactionCurrency_id, CurrencyShareds.id
from CurrencyShareds 
INNER JOIN annexes
ON Amount_TransactionCurrency_id % 9223372036854775809 = CurrencyShareds.Id % 9223372036854775809
我懷疑這是可行的，因為 CE 似乎使用密度而不是直方圖。其他類似的重寫可能具有相同的效果。不能保證這種查詢類型在將來會繼續正常工作。這就是為什麼您應該聯繫 Microsoft 以提高某一天對您的問題的修復將其納入已發布產品的可能性。

引用自：https://dba.stackexchange.com/questions/207671

內連接的基數估計問題

給定

當所有統計數據都是最新的

如果貨幣共享的統計數據不是最新的，那麼

當直方圖發生變化時

這是為什麼？

只是為了讓一切清楚 - 為什麼我們在貨幣共享中有如此奇怪的 ID？

如何解決問題？

得到教訓

相關問答

由於數據偏斜，嵌套循環的估計值非常低

插入性能不佳的表 Exec SP

IN 子句中的冗餘條目

SQL Server 查詢與連結伺服器執行沒有並行

常量變數為空時執行計劃不同

SQL Server–If 儲存過程中的邏輯和計劃記憶體