Postgresql

Postgres對可空索引的慢查詢

  • January 10, 2022

Postgres 版本:12.7

表定義:

column_name,    data_type,                      is_nullable
"id",           "integer",                      "NO"
"data",         "jsonb",                        "NO"
"refreshed_utc","timestamp without time zone",  "YES"
"purged_utc",   "timestamp without time zone",  "YES"

索引定義:

"customer"  "index_purged_utc"      "CREATE INDEX index_purged_utc ON public.customer USING btree (purged_utc)"
"customer"  "index_refreshed_utc"   "CREATE INDEX index_refreshed_utc ON public.customer USING btree (refreshed_utc)"
"customer"  "pkey_id"               "CREATE UNIQUE INDEX pkey_id ON public.customer USING btree (id)"

基數:

Total table rows: 930615
Table rows where purged_utc is null: 773033
Table rows where purged_utc is null & refreshed_utc < '2021-07-11 00:00:00': 554

要分析的查詢:

在 refreshed_utc < ‘2021-07-11 00:00:00’ 條件下,排序和限制

EXPLAIN ANALYZE 
SELECT *
FROM public.customer
WHERE refreshed_utc &lt; '2021-07-11 00:00:00'
ORDER BY refreshed_utc LIMIT 10

Result:
"Limit  (cost=0.42..26.83 rows=10 width=1225) (actual time=1.567..3.174 rows=10 loops=1)"
"  -&gt;  Index Scan using index_refreshed_utc on customer  (cost=0.42..412819.81 rows=156368 width=1225) (actual time=1.566..3.171 rows=10 loops=1)"
"        Index Cond: (refreshed_utc &lt; '2021-07-11 14:12:01.073442'::timestamp without time zone)"
"Planning Time: 0.113 ms"
"Execution Time: 3.235 ms"

有兩個條件(沒有排序和限制)

EXPLAIN ANALYZE 
SELECT *
FROM public.customer
WHERE purged_utc is NULL AND refreshed_utc &lt; '2021-07-11 00:00:00'

Result:
"Seq Scan on customer  (cost=0.00..175715.76 rows=130062 width=1225) (actual time=1587.866..4089.368 rows=554 loops=1)"
"  Filter: ((purged_utc IS NULL) AND (refreshed_utc &lt; '2021-07-11 00:00:00'::timestamp without time zone))"
"  Rows Removed by Filter: 930100"
"Planning Time: 0.079 ms"
"Execution Time: 4089.539 ms"

在這兩個條件下,排序和限制:

EXPLAIN ANALYZE 
SELECT *
FROM public.customer
WHERE purged_utc is NULL AND refreshed_utc &lt; '2021-07-11 00:00:00'
ORDER BY refreshed_utc LIMIT 10

Result:
"Limit  (cost=0.42..32.16 rows=10 width=1225) (actual time=47783.313..47786.324 rows=10 loops=1)"
"  -&gt;  Index Scan using index_refreshed_utc on customer  (cost=0.42..412815.81 rows=130062 width=1225) (actual time=47783.312..47786.319 rows=10 loops=1)"
"        Index Cond: (refreshed_utc &lt; '2021-07-11 00:00:00'::timestamp without time zone)"
"        Filter: (purged_utc IS NULL)"
"        Rows Removed by Filter: 157611"
"Planning Time: 0.160 ms"
"Execution Time: 47787.474 ms"

我真的很難理解為什麼包括條件以及順序和限制的查詢如此緩慢。我可以看到它使用了一個索引,然後使用了一個過濾器(purged_utc IS NULL),我認為它是索引掃描的初始結果。

我希望有人可以闡明以下幾點:

  • 什麼可能導致這種糟糕的查詢性能?

    • 自建立以來,這張桌子上沒有執行過任何自動吸塵器。這會導致這麼長的查詢時間嗎?
    • 在這種情況下,seq 掃描的速度要快 10 倍以上——這是否表明需要清理/重新索引?
  • 為什麼 EXPLAIN ANALYZE 在實際成本中表明索引掃描直到:47783.313 才開始?

  • 我考慮在 purged_utc 和 refreshed_utc 上創建一個複合索引,但在這種情況下我發現真正奇怪的是,我的數據庫中具有非常相似數據的非常相似的表即使使用相同的查詢也不會遭受如此糟糕的性能。

更新: 解釋範例(分析,緩衝區)

"Limit  (cost=0.42..31.20 rows=10 width=12) (actual time=43902.933..43904.053 rows=10 loops=1)"
"  Buffers: shared hit=10270 read=140259 dirtied=39"
"  I/O Timings: read=43189.074"
"  -&gt;  Index Scan using index_refreshed_utc on customer (cost=0.42..424916.75 rows=138059 width=12) (actual time=43902.931..43904.047 rows=10 loops=1)"
"        Index Cond: (refreshed_utc &lt; '2021-07-14 14:12:01.073442'::timestamp without time zone)"
"        Filter: (purged_utc IS NULL)"
"        Rows Removed by Filter: 156538"
"        Buffers: shared hit=10270 read=140259 dirtied=39"
"        I/O Timings: read=43189.074"
"Planning Time: 0.103 ms"
"Execution Time: 43904.125 ms"

正如建議的那樣,我還將提供一個不同的數據庫表的結果,除了命名和 jsonb 列的內容之外,該表在模式方面是相同的:

"Limit  (cost=0.43..30.47 rows=10 width=12) (actual time=737.129..737.158 rows=10 loops=1)"
"  Buffers: shared hit=158584 read=32362 written=4"
"  I/O Timings: read=141.067 write=0.136"
"  -&gt;  Index Scan using index_refreshed_utc on account (cost=0.43..526833.80 rows=175353 width=12) (actual time=737.127..737.155 rows=10 loops=1)"
"        Index Cond: (refreshed_utc &lt; '2021-07-14 14:12:01.073442'::timestamp without time zone)"
"        Filter: (purged_utc IS NULL)"
"        Rows Removed by Filter: 214764"
"        Buffers: shared hit=158584 read=32362 written=4"
"        I/O Timings: read=141.067 write=0.136"
"Planning Time: 0.096 ms"
"Execution Time: 737.217 ms"

數據庫選擇使用索引掃描以快速找到正確排序順序的行,但它必須掃描並丟棄比它認為的更多的行。

如果簡單ANALYZE的表格不能解決問題,您有兩種選擇:

  • 創建完美索引:
CREATE INDEX ON customer (refreshed_utc) WHERE purged_utc IS NULL;
  • 通過使用強制 PostgreSQL 避免該索引掃描
ORDER BY refreshed_utc + INTERVAL '00:00:00'

為什麼EXPLAIN ANALYZE在實際成本中表明索引掃描直到:47783.313 才開始?

47783.313 ms 不是“索引掃描開始”的時間。這是測量的“啟動成本”,即返回第一行之前的時間。檢查堆元組 ( ) 後丟棄 157611 行Rows Removed by Filter: 157611手冊:

估計的啟動成本。這是在輸出階段可以開始之前花費的時間,例如,在排序節點中進行排序的時間。

無論哪種方式都很昂貴,但是如果該jsonb列很大(並且不足以大到主要是“TOAST”ed),Postgres 必須徒勞地讀取許多堆頁面。

要麼,和/或您的設置有其他問題(除了次優索引)來解釋令人驚訝的糟糕性能。硬體、伺服器配置、資源不足、表和索引過度膨脹?

EXPLAIN (ANALYZE, BUFFERS)而不僅僅是EXPLAIN ANALYZE在那裡更有洞察力。

數據分佈對您的查詢非常不利,因為只有極少數行purged_utc is null早於 2021-07-11(773033 中有 554 行)。由於總體上還有更多,並且由於那個小LIMIT 10,Postgres 錯誤地期望更快地找到足夠的符合條件的行,遍歷索引。

Laurenz 建議的部分索引將節省大部分時間浪費在手頭的查詢上。

更新:(帶前導)

上更大但更通用的索引也應該工作,但不會給我們更快的索引掃描。看評論。(purged_utc, refreshed_utc)``purged_utc

引用自:https://dba.stackexchange.com/questions/305783