Postgresql

帶有大型子查詢過濾器的順序掃描“永不”結束

  • August 8, 2018

我正在使用 postgres 9.2.4,並執行以下查詢:

explain analyze select * from bubu where id not in
    (select bubu_id from kuku limit 33554431);
                                                            QUERY PLAN                                                             
------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on bubu  (cost=913175.15..1041265.77 rows=1761465 width=160) (actual time=37565.575..42569.266 rows=596 loops=1)
  Filter: (NOT (hashed SubPlan 1))
  Rows Removed by Filter: 3511745
  SubPlan 1
    ->  Limit  (cost=0.00..829289.07 rows=33554431 width=8) (actual time=20.528..22363.943 rows=33554431 loops=1)
          ->  Seq Scan on kuku  (cost=0.00..830246.84 rows=33593184 width=8) (actual time=20.528..18741.263 rows=33554431 loops=1)
Total runtime: 42579.485 ms
(7 rows)

然後是第二個查詢,它永遠不會結束:

explain analyze select * from bubu where id not in
  (select bubu_id from kuku limit 33554433)

無限制的查詢也卡住了

explain analyze select * from bubu where id not in
  (select bubu_id from kuku) 

卡住查詢的解釋:

explain select * from bubu where id not in
  (select bubu_id from kuku) 

                                 QUERY PLAN                                  
------------------------------------------------------------------------------
Seq Scan on bubu  (cost=0.00..2137396495180.43 rows=1761465 width=160)
  Filter: (NOT (SubPlan 1))
  SubPlan 1
    ->  Materialize  (cost=0.00..1129436.76 rows=33593184 width=8)
          ->  Seq Scan on kuku  (cost=0.00..830246.84 rows=33593184 width=8)
  (5 rows)

我真正需要的查詢是

delete from bubu where id not in
    (select bubu_id from kuku)

有沒有我可以調整的 PostgreSQL 參數來避免這個問題?

您的查詢很慢,因為它們實際上做了很多事情。在你的

explain analyze
   select * from bubu where id not in
   (select bubu_id from kuku limit 33554431);

例如,數據庫以未定義的順序從 中獲取多達 33,554,431 行kuku,然後進行另一次掃描bubu以選擇不在獲取的 33m+ 行中的行,然後最終獲取並返回符合此類條件的行。

如果你想做

delete from bubu where id not in
(select bubu_id from kuku)

那麼你的選擇是:

  1. 只需執行它並等待。

  2. 分塊做:

  3. 將要比較的所有 ID 提取到單獨的表中:

CREATE TABLE temp_bubu_ids AS SELECT bubu_id FROM kuku;
 從 IDs 表中獲取 1000 個左右的 ID,刪除它們,然後從`bubu`符合您的條件的行中刪除:
TH deleted_ids AS (
LETE FROM temp_bubu_ids

 DELETE doesn't support LIMIT
ERE ctid IN (
LECT ctid
OM temp_bubu_ids
MIT 1000
TURNING id
LETE FROM bubu
ERE id NOT IN (
LECT id
OM deleted_ids
 沖洗並重複,直到所有適當的行都被刪除。

不確定它是否有資格作為答案…

我的問題在這裡被問到(未能提出“Materialized”關鍵字)並得到了我尋找的答案: https ://stackoverflow.com/questions/26477353/postgres-materialize-causes-poor-performance-in-delete -詢問

引用自:https://dba.stackexchange.com/questions/214092