添加子查詢時PostgreSQL查詢非常慢

December 5, 2017

我對一個有 150 萬行的表有一個相對簡單的查詢：
SELECT mtid FROM publication
WHERE mtid IN (9762715) OR last_modifier=21321
LIMIT 5000;
EXPLAIN ANALYZE輸出：
Limit  (cost=8.84..12.86 rows=1 width=8) (actual time=0.985..0.986 rows=1 loops=1)
  -&gt;  Bitmap Heap Scan on publication  (cost=8.84..12.86 rows=1 width=8) (actual time=0.984..0.985 rows=1 loops=1)
        Recheck Cond: ((mtid = 9762715) OR (last_modifier = 21321))
        -&gt;  BitmapOr  (cost=8.84..8.84 rows=1 width=0) (actual time=0.971..0.971 rows=0 loops=1)
              -&gt;  Bitmap Index Scan on publication_pkey  (cost=0.00..4.42 rows=1 width=0) (actual time=0.295..0.295 rows=1 loops=1)
                    Index Cond: (mtid = 9762715)
              -&gt;  Bitmap Index Scan on publication_last_modifier_btree  (cost=0.00..4.42 rows=1 width=0) (actual time=0.674..0.674 rows=0 loops=1)
                    Index Cond: (last_modifier = 21321)
Total runtime: 1.027 ms
到目前為止，一切都很好，速度很快，並且使用了可用的索引。
現在，如果我稍微修改一下查詢，結果將是：
SELECT mtid FROM publication
WHERE mtid IN (SELECT 9762715) OR last_modifier=21321
LIMIT 5000;
EXPLAIN ANALYZE輸出是：
Limit  (cost=0.01..2347.74 rows=5000 width=8) (actual time=2735.891..2841.398 rows=1 loops=1)
  -&gt;  Seq Scan on publication  (cost=0.01..349652.84 rows=744661 width=8) (actual time=2735.888..2841.393 rows=1 loops=1)
        Filter: ((hashed SubPlan 1) OR (last_modifier = 21321))
        SubPlan 1
          -&gt;  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
Total runtime: 2841.442 ms
沒那麼快，並且使用 seq 掃描…
當然，應用程序執行的原始查詢要復雜一些，甚至更慢，當然，hibernate 生成的原始查詢不是(SELECT 9762715)，但即使是這樣，速度也很慢(SELECT 9762715)！查詢是由hibernate生成的，因此更改它們是一個相當大的挑戰，並且某些功能不可用（例如UNION不可用，這將很快）。
問題
為什麼不能在第二種情況下使用索引？如何使用它們？
我可以通過其他方式提高查詢性能嗎？
額外的想法
似乎我們可以通過手動執行 SELECT 來使用第一種情況，然後將結果列表放入查詢中。即使 IN() 列表中有 5000 個數字，它也比第二種解決方案快四倍。然而，這似乎是錯誤的（而且，它可能快 100 倍:)）。完全無法理解為什麼查詢計劃器對這兩個查詢使用完全不同的方法，所以我想找到一個更好的解決這個問題的方法。

我的同事找到了一種更改查詢的方法，以便它需要簡單的重寫並執行它需要做的事情，即一步執行子選擇，然後對結果進行進一步的操作：

SELECT mtid FROM publication 
WHERE 
 mtid = ANY( (SELECT ARRAY(SELECT 9762715))::bigint[] )
 OR last_modifier=21321
LIMIT 5000;

現在的解釋分析是：

Limit  (cost=92.58..9442.38 rows=2478 width=8) (actual time=0.071..0.074 rows=1 loops=1)
  InitPlan 2 (returns $1)
    -&gt;  Result  (cost=0.01..0.02 rows=1 width=0) (actual time=0.010..0.011 rows=1 loops=1)
          InitPlan 1 (returns $0)
            -&gt;  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.002 rows=1 loops=1)
  -&gt;  Bitmap Heap Scan on publication  (cost=92.56..9442.36 rows=2478 width=8) (actual time=0.069..0.070 rows=1 loops=1)
        Recheck Cond: ((mtid = ANY (($1)::bigint[])) OR (last_modifier = 21321))
        Heap Blocks: exact=1
        -&gt;  BitmapOr  (cost=92.56..92.56 rows=2478 width=0) (actual time=0.060..0.060 rows=0 loops=1)
              -&gt;  Bitmap Index Scan on publication_pkey  (cost=0.00..44.38 rows=10 width=0) (actual time=0.046..0.046 rows=1 loops=1)
                    Index Cond: (mtid = ANY (($1)::bigint[]))
              -&gt;  Bitmap Index Scan on publication_last_modifier_btree  (cost=0.00..46.94 rows=2468 width=0) (actual time=0.011..0.011 rows=0 loops=1)
                    Index Cond: (last_modifier = 21321)
Planning time: 0.704 ms
Execution time: 0.153 ms

似乎我們可以創建一個簡單的解析器，以這種方式查找並重寫所有子選擇，並將其添加到休眠掛鉤以操作本機查詢。

問題的核心在這裡變得顯而易見：
發佈時的 Seq Scan（成本=0.01..349652.84行=744661寬度=8）（實際時間=2735.888..2841.393行=1循環=1）
Postgres估計返回 744661 行，而事實上，結果是單行。如果 Postgres 不知道從查詢中可以期待什麼，它就無法更好地計劃。我們需要查看隱藏在背後的實際查詢(SELECT 9762715)——並且可能還需要知道表定義、約束、基數和數據分佈。顯然，Postgres 無法預測它將返回多少行*。可能有一些方法可以重寫查詢，具體取決於它是*什麼。
如果您知道子查詢永遠不會返回多於***n***行，您可以使用以下命令告訴 Postgres：
SELECT mtid
FROM   publication
WHERE  mtid IN **(SELECT ... LIMIT *n*)** --  OR last_modifier=21321
LIMIT  5000;
如果n足夠小，Postgres 將切換到（點陣圖）索引掃描。但是，這只適用於簡單的情況。添加OR條件時停止工作：查詢計劃程序目前無法處理。
我很少用IN (SELECT ...)開始。通常有更好的方法來實現相同的，通常使用EXISTS半連接。有時帶有 ( LEFT) JOIN( LATERAL) …
明顯的解決方法是使用**UNION**，但您排除了這一點。如果不知道實際的子查詢和其他相關細節，我不能說更多。

引用自：https://dba.stackexchange.com/questions/114337

添加子查詢時PostgreSQL查詢非常慢

問題

額外的想法

相關問答

如果我將它分成 3 個較小的查詢，為什麼我的大 Postgres 查詢會更快？

基本子查詢評估

Postgresql 多次擴展視圖而不是執行一次

如何在不重複子查詢的情況下進行減法？

如何在不失去屬性的情況下有效地使用 GROUP BY？

count(*) 在子查詢中的表現