帶有大型子查詢過濾器的順序掃描“永不”結束

August 8, 2018

我正在使用 postgres 9.2.4，並執行以下查詢：

explain analyze select * from bubu where id not in
    (select bubu_id from kuku limit 33554431);
                                                            QUERY PLAN                                                             
------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on bubu  (cost=913175.15..1041265.77 rows=1761465 width=160) (actual time=37565.575..42569.266 rows=596 loops=1)
  Filter: (NOT (hashed SubPlan 1))
  Rows Removed by Filter: 3511745
  SubPlan 1
    -&gt;  Limit  (cost=0.00..829289.07 rows=33554431 width=8) (actual time=20.528..22363.943 rows=33554431 loops=1)
          -&gt;  Seq Scan on kuku  (cost=0.00..830246.84 rows=33593184 width=8) (actual time=20.528..18741.263 rows=33554431 loops=1)
Total runtime: 42579.485 ms
(7 rows)

然後是第二個查詢，它永遠不會結束：

explain analyze select * from bubu where id not in
  (select bubu_id from kuku limit 33554433)

無限制的查詢也卡住了

explain analyze select * from bubu where id not in
  (select bubu_id from kuku)

卡住查詢的解釋：

explain select * from bubu where id not in
  (select bubu_id from kuku) 

                                 QUERY PLAN                                  
------------------------------------------------------------------------------
Seq Scan on bubu  (cost=0.00..2137396495180.43 rows=1761465 width=160)
  Filter: (NOT (SubPlan 1))
  SubPlan 1
    -&gt;  Materialize  (cost=0.00..1129436.76 rows=33593184 width=8)
          -&gt;  Seq Scan on kuku  (cost=0.00..830246.84 rows=33593184 width=8)
  (5 rows)

我真正需要的查詢是

delete from bubu where id not in
    (select bubu_id from kuku)

有沒有我可以調整的 PostgreSQL 參數來避免這個問題？

您的查詢很慢，因為它們實際上做了很多事情。在你的
explain analyze
   select * from bubu where id not in
   (select bubu_id from kuku limit 33554431);
例如，數據庫以未定義的順序從中獲取多達 33,554,431 行kuku，然後進行另一次掃描bubu以選擇不在獲取的 33m+ 行中的行，然後最終獲取並返回符合此類條件的行。
如果你想做
delete from bubu where id not in
(select bubu_id from kuku)
那麼你的選擇是：
只需執行它並等待。
分塊做：
將要比較的所有 ID 提取到單獨的表中：
CREATE TABLE temp_bubu_ids AS SELECT bubu_id FROM kuku;
 從 IDs 表中獲取 1000 個左右的 ID，刪除它們，然後從`bubu`符合您的條件的行中刪除：
TH deleted_ids AS (
LETE FROM temp_bubu_ids

 DELETE doesn't support LIMIT
ERE ctid IN (
LECT ctid
OM temp_bubu_ids
MIT 1000
TURNING id
LETE FROM bubu
ERE id NOT IN (
LECT id
OM deleted_ids
 沖洗並重複，直到所有適當的行都被刪除。

不確定它是否有資格作為答案…
我的問題在這裡被問到（未能提出“Materialized”關鍵字）並得到了我尋找的答案： https ://stackoverflow.com/questions/26477353/postgres-materialize-causes-poor-performance-in-delete -詢問

引用自：https://dba.stackexchange.com/questions/214092

帶有大型子查詢過濾器的順序掃描“永不”結束

相關問答

不同 Postgres 數據庫上相同查詢的不同執行時間

臨時表上的索引使用情況

解釋的 Postgres 權限

提高 GROUP BY 子句中的排序性能

如何使用索引進行簡單的時間範圍連接？

使用 `format()` 動態建構的 `query_int` 的查詢計劃

帶有大型子查詢過濾器的順序掃描“永不”結束

相關問答

不同 Postgres 數據庫上相同查詢的不同執行時間

臨時表上的索引使用情況

解釋的 Postgres 權限

提高 GROUP BY 子句中的排序性能

如何使用索引進行簡單的時間範圍連接？

使用 format() 動態建構的 query_int 的查詢計劃

使用 `format()` 動態建構的 `query_int` 的查詢計劃