Postgresql
帶有大型子查詢過濾器的順序掃描“永不”結束
我正在使用 postgres 9.2.4,並執行以下查詢:
explain analyze select * from bubu where id not in (select bubu_id from kuku limit 33554431); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Seq Scan on bubu (cost=913175.15..1041265.77 rows=1761465 width=160) (actual time=37565.575..42569.266 rows=596 loops=1) Filter: (NOT (hashed SubPlan 1)) Rows Removed by Filter: 3511745 SubPlan 1 -> Limit (cost=0.00..829289.07 rows=33554431 width=8) (actual time=20.528..22363.943 rows=33554431 loops=1) -> Seq Scan on kuku (cost=0.00..830246.84 rows=33593184 width=8) (actual time=20.528..18741.263 rows=33554431 loops=1) Total runtime: 42579.485 ms (7 rows)
然後是第二個查詢,它永遠不會結束:
explain analyze select * from bubu where id not in (select bubu_id from kuku limit 33554433)
無限制的查詢也卡住了
explain analyze select * from bubu where id not in (select bubu_id from kuku)
卡住查詢的解釋:
explain select * from bubu where id not in (select bubu_id from kuku) QUERY PLAN ------------------------------------------------------------------------------ Seq Scan on bubu (cost=0.00..2137396495180.43 rows=1761465 width=160) Filter: (NOT (SubPlan 1)) SubPlan 1 -> Materialize (cost=0.00..1129436.76 rows=33593184 width=8) -> Seq Scan on kuku (cost=0.00..830246.84 rows=33593184 width=8) (5 rows)
我真正需要的查詢是
delete from bubu where id not in (select bubu_id from kuku)
有沒有我可以調整的 PostgreSQL 參數來避免這個問題?
您的查詢很慢,因為它們實際上做了很多事情。在你的
explain analyze select * from bubu where id not in (select bubu_id from kuku limit 33554431);
例如,數據庫以未定義的順序從 中獲取多達 33,554,431 行
kuku
,然後進行另一次掃描bubu
以選擇不在獲取的 33m+ 行中的行,然後最終獲取並返回符合此類條件的行。如果你想做
delete from bubu where id not in (select bubu_id from kuku)
那麼你的選擇是:
只需執行它並等待。
分塊做:
將要比較的所有 ID 提取到單獨的表中:
CREATE TABLE temp_bubu_ids AS SELECT bubu_id FROM kuku;
從 IDs 表中獲取 1000 個左右的 ID,刪除它們,然後從`bubu`符合您的條件的行中刪除:
TH deleted_ids AS ( LETE FROM temp_bubu_ids DELETE doesn't support LIMIT ERE ctid IN ( LECT ctid OM temp_bubu_ids MIT 1000 TURNING id LETE FROM bubu ERE id NOT IN ( LECT id OM deleted_ids
沖洗並重複,直到所有適當的行都被刪除。
不確定它是否有資格作為答案…
我的問題在這裡被問到(未能提出“Materialized”關鍵字)並得到了我尋找的答案: https ://stackoverflow.com/questions/26477353/postgres-materialize-causes-poor-performance-in-delete -詢問