Postgres 優化 IN 查詢

November 15, 2021

我正在嘗試優化 Postgres 11 中的“IN”查詢。

我測試了 3 種方法：

情況1：

DELETE FROM mytable WHERE 
id1='fffe9411-3b9d-40dc-9cc6-14407785be8b' 
and 
id2 IN ('00000140-1ae9-41f7-9614-453c063cee52'...'0000693d-0570-41e2-81e9-288261b3b2e5');

案例二：

DELETE FROM mytable 
USING  
unnest ('{00000140-1ae9-41f7-9614-453c063cee52..0000693d-0570-41e2-81e9-288261b3b2e5}'::text[]) unnestid 
WHERE 
id2 = unnestid 
and 
mytable.id1='fffe9411-3b9d-40dc-9cc6-14407785be8b';

案例3：

DELETE FROM mytable WHERE 
id1='fffe9411-3b9d-40dc-9cc6-14407785be8b' 
and id2 
IN (VALUES ('00000140-1ae9-41f7-9614-453c063cee52')..('0000693d-0570-41e2-81e9-288261b3b2e5'));

案例一的查詢計劃：

Delete on mytable  (cost=0.56..7.18 rows=1 width=6) (actual time=0.601..0.602 rows=0 loops=1)
  -&gt;  Index Scan using mytable_pkey on mytable  (cost=0.56..7.18 rows=1 width=6) (actual time=0.073..0.453 rows=6 loops=1)
        Index Cond: (id1 = 'fffe9411-3b9d-40dc-9cc6-14407785be8b'::text)
        Filter: (id2 = ANY ('{00000140-1ae9-41f7-9614-453c063cee52,00005327-2400-40cc-bd22-39cc7fc6744e,00005a71-3cad-4253-9afe-dc3fe5609dc1,000062be-95ae-4485-800d-d969a232ebf1,0000663e-7675-4b93-86a5-742de1bab70d,0000693d-0570-41e2-81e9-288261b3b2e5}'::text[]))
        Rows Removed by Filter: 95

案例2的查詢計劃：

Delete on mytable  (cost=7.21..8.47 rows=1 width=62) (actual time=0.691..0.693 rows=0 loops=1)
  -&gt;  Hash Join  (cost=7.21..8.47 rows=1 width=62) (actual time=0.650..0.656 rows=6 loops=1)
        Hash Cond: (unnestid.unnestid = mytable.id2)
        -&gt;  Function Scan on unnest unnestid  (cost=0.00..1.00 rows=100 width=88) (actual time=0.019..0.022 rows=6 loops=1)
        -&gt;  Hash  (cost=7.15..7.15 rows=5 width=43) (actual time=0.594..0.595 rows=101 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 16kB
              -&gt;  Index Scan using mytable_pkey on mytable  (cost=0.56..7.15 rows=5 width=43) (actual time=0.087..0.554 rows=101 loops=1)
                    Index Cond: (id1 = 'fffe9411-3b9d-40dc-9cc6-14407785be8b'::text)

案例3的查詢計劃：

Delete on mytable  (cost=0.71..7.31 rows=1 width=62) (actual time=0.649..0.651 rows=0 loops=1)
  -&gt;  Hash Semi Join  (cost=0.71..7.31 rows=1 width=62) (actual time=0.140..0.598 rows=6 loops=1)
        Hash Cond: (followers.accountid = "*VALUES*".column1)
        -&gt;  Index Scan using mytable_pkey on mytable  (cost=0.56..7.15 rows=5 width=43) (actual time=0.094..0.529 rows=101 loops=1)
              Index Cond: (id1 = 'fffe9411-3b9d-40dc-9cc6-14407785be8b'::text)
        -&gt;  Hash  (cost=0.08..0.08 rows=6 width=88) (actual time=0.019..0.020 rows=6 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 9kB
              -&gt;  Values Scan on "*VALUES*"  (cost=0.00..0.08 rows=6 width=88) (actual time=0.010..0.013 rows=6 loops=1)

我不確定哪種方法更有效？

約束：

1> 我的 IN LIST 最多可以有 100 個元素。

2> mytable 在 id1 和 id2 上有索引

我傾向於接近 1 和 3

$$ In case 3: we are making hash against IN List which is Max 100 $$ 參考：使用大 IN 優化 Postgres 查詢

根據上面的連結案例 2 和案例 3 更好，但讓我感到困惑的是 FILTER 並不比進行雜湊連接便宜？

Hash Join，將在內部和外部循環上進行順序掃描..

但是選項 1 僅對 in 子句的 mytable + filter 進行 seq 掃描。

第一個查詢計劃中的兩個揭示位：
Index Scan using mytable_pkey on mytable  (cost=0.56..7.18 rows=1 width=6)
                                          (actual time=0.073..0.453 rows=6 loops=1)
rows=6. 因此，如何應用第二個過濾器幾乎無關緊要。第一個過濾器已經完成了幾乎所有的工作！你在這裡完成了。
如果第一個 UUID 上的過濾器不是那麼有選擇性，則多列索引(id1, id2)可能會很有用。
Index Cond: (id1 = 'fffe9411-3b9d-40dc-9cc6-14407785be8b'::text)
您將 UUID 數字儲存為**text**，這是一個很大的浪費。使用正確的類型uuid可以使儲存空間更小，速度更快。（並強制執行有效的 UUID。）請參閱：
MD5 欄位的最佳數據類型是什麼？
UUID 或 BIGSERIAL 外鍵 Postgres
您提到的我的舊答案仍然沒有錯。但它是在 2015 年為 Postgres 9.4 編寫的。我們現在有了 Postgres 14，它變得更加智能。

引用自：https://dba.stackexchange.com/questions/302610

Postgres 優化 IN 查詢

相關問答

PostgreSQL（大數據）：按標籤搜尋並按時間戳排序

使用帶有 pg_restore 的並行恢復沒有加速

為什麼我在 Postgres 13 中的空間查詢比在 Postgres 11 中慢？

Postgres 11：升級後查詢計劃使用 seq 掃描

Postgres 查詢不使用索引，但使用索引更快

為什麼我的查詢處於空閒狀態？