Postgresql

Postgres 是否使用子查詢優化此 JOIN?

  • July 18, 2020

在 Postgres 12 中,我有一張桌子purchase_orders和一張items. 我正在執行一個查詢,該查詢返回給定的 POshop以及在每個 PO 上訂購的項目總和:

SELECT po.id, 
      SUM(grouped_items.total_quantity) AS total_quantity
FROM purchase_orders po
LEFT JOIN (
 SELECT purchase_order_id, 
 SUM(quantity) AS total_quantity
 FROM items
 GROUP BY purchase_order_id
) grouped_items ON po.id = grouped_items.purchase_order_id

WHERE po.shop_id = 195
GROUP BY po.id

此查詢返回所需的結果。JOIN 在子查詢中,因為將有其他表的其他 JOIN,所以這會產生一個已經分組的表來連接。

我用相關 SELECT子查詢而不是 JOIN 編寫了另一個查詢。**執行這兩種方法的執行時間幾乎相同,**因此很難看出哪個更快。我跑了EXPLAIN ANALYZE,但不能很好地解釋它。

問題:在上面的例子中,Postgres 會處理子查詢的整個items表,並且只有在加入purchase_orders? 或者如果items首先過濾掉集合是否足夠聰明?

EXPLAIN報告提到“Seq Scan on Items…”似乎包含 中的所有行items,然後隨著向上移動樹而減少。但不確定這是否意味著它實際上是SUM在記憶體中的整個表。

解釋:

GroupAggregate  (cost=6948.16..6973.00 rows=1242 width=40) (actual time=165.099..166.321 rows=1242 loops=1)
 Group Key: po.id
 Buffers: shared hit=4148
 ->  Sort  (cost=6948.16..6951.27 rows=1242 width=16) (actual time=165.090..165.406 rows=1242 loops=1)
       Sort Key: po.id
       Sort Method: quicksort  Memory: 107kB
       Buffers: shared hit=4148
       ->  Hash Right Join  (cost=6668.31..6884.34 rows=1242 width=16) (actual time=99.951..120.627 rows=1242 loops=1)
             Hash Cond: (items.purchase_order_id = po.id)
             Buffers: shared hit=4148
             ->  HashAggregate  (cost=5906.04..5993.80 rows=8776 width=16) (actual time=98.328..104.320 rows=14331 loops=1)
                   Group Key: items.purchase_order_id
                   Buffers: shared hit=3749
                   ->  Seq Scan on items  (cost=0.00..5187.03 rows=143803 width=12) (actual time=0.005..38.307 rows=143821 loops=1)
                         Buffers: shared hit=3749
             ->  Hash  (cost=746.74..746.74 rows=1242 width=8) (actual time=1.588..1.588 rows=1242 loops=1)
                   Buckets: 2048  Batches: 1  Memory Usage: 65kB
                   Buffers: shared hit=399
                   ->  Bitmap Heap Scan on purchase_orders po  (cost=33.91..746.74 rows=1242 width=8) (actual time=0.200..1.169 rows=1242 loops=1)
                         Recheck Cond: (shop_id = 195)
                         Heap Blocks: exact=392
                         Buffers: shared hit=399
                         ->  Bitmap Index Scan on index_purchase_orders_on_shop_id  (cost=0.00..33.60 rows=1242 width=0) (actual time=0.153..0.153 rows=1258 loops=1)
                               Index Cond: (shop_id = 195)
                               Buffers: shared hit=7
Planning time: 0.200 ms
Execution time: 166.665 ms

第二種方法,使用相關子查詢:

SELECT po.id,
      (
          SELECT SUM(quantity)
          FROM items
          WHERE purchase_order_id = po.id
          GROUP BY purchase_order_id
      ) AS total_quantity
FROM purchase_orders po
WHERE shop_id = 195
GROUP BY po.id

解釋:

HashAggregate  (cost=749.84..25716.43 rows=1242 width=16) (actual time=1.667..9.488 rows=1243 loops=1)
 Group Key: po.id
 Buffers: shared hit=5603
 ->  Bitmap Heap Scan on purchase_orders po  (cost=33.91..746.74 rows=1242 width=8) (actual time=0.175..1.072 rows=1243 loops=1)
       Recheck Cond: (shop_id = 195)
       Heap Blocks: exact=390
       Buffers: shared hit=397
       ->  Bitmap Index Scan on index_purchase_orders_on_shop_id  (cost=0.00..33.60 rows=1242 width=0) (actual time=0.130..0.130 rows=1244 loops=1)
             Index Cond: (shop_id = 195)
             Buffers: shared hit=7
 SubPlan 1
   ->  GroupAggregate  (cost=0.42..20.09 rows=16 width=16) (actual time=0.005..0.005 rows=1 loops=1243)
         Group Key: items.purchase_order_id
         Buffers: shared hit=5206
         ->  Index Scan using index_items_on_purchase_order_id on items  (cost=0.42..19.85 rows=16 width=12) (actual time=0.003..0.004 rows=3 loops=1243)
               Index Cond: (purchase_order_id = po.id)
               Buffers: shared hit=5206
Planning time: 0.183 ms
Execution time: 9.831 ms

我最近一直在研究這個問題,我的結論是規劃器不夠聰明,無法優化這個特定的東西。即使是大量行,相關的子選擇也會為每一行執行一次,而不相關的子選擇即使只需要幾行,也會執行到完成。

它確實知道一個會比另一個快(假設估計的行數相當正確),但它缺乏辨識這兩個公式相同的能力,因此根據估計的性能在執行計劃之間進行選擇。

儘管在您的情況下,查詢不會相同,因為它們以不同的方式處理“項目”中的缺失行。相關子選擇與左連接相同,而不是內連接。

引用自:https://dba.stackexchange.com/questions/270009