Postgresql
Postgres 是否使用子查詢優化此 JOIN?
在 Postgres 12 中,我有一張桌子
purchase_orders
和一張items
. 我正在執行一個查詢,該查詢返回給定的 POshop
以及在每個 PO 上訂購的項目總和:SELECT po.id, SUM(grouped_items.total_quantity) AS total_quantity FROM purchase_orders po LEFT JOIN ( SELECT purchase_order_id, SUM(quantity) AS total_quantity FROM items GROUP BY purchase_order_id ) grouped_items ON po.id = grouped_items.purchase_order_id WHERE po.shop_id = 195 GROUP BY po.id
此查詢返回所需的結果。JOIN 在子查詢中,因為將有其他表的其他 JOIN,所以這會產生一個已經分組的表來連接。
我用相關
SELECT
子查詢而不是 JOIN 編寫了另一個查詢。**執行這兩種方法的執行時間幾乎相同,**因此很難看出哪個更快。我跑了EXPLAIN ANALYZE
,但不能很好地解釋它。問題:在上面的例子中,Postgres 會處理子查詢的整個
items
表,並且只有在加入purchase_orders
? 或者如果items
首先過濾掉集合是否足夠聰明?該
EXPLAIN
報告提到“Seq Scan on Items…”似乎包含 中的所有行items
,然後隨著向上移動樹而減少。但不確定這是否意味著它實際上是SUM
在記憶體中的整個表。解釋:
GroupAggregate (cost=6948.16..6973.00 rows=1242 width=40) (actual time=165.099..166.321 rows=1242 loops=1) Group Key: po.id Buffers: shared hit=4148 -> Sort (cost=6948.16..6951.27 rows=1242 width=16) (actual time=165.090..165.406 rows=1242 loops=1) Sort Key: po.id Sort Method: quicksort Memory: 107kB Buffers: shared hit=4148 -> Hash Right Join (cost=6668.31..6884.34 rows=1242 width=16) (actual time=99.951..120.627 rows=1242 loops=1) Hash Cond: (items.purchase_order_id = po.id) Buffers: shared hit=4148 -> HashAggregate (cost=5906.04..5993.80 rows=8776 width=16) (actual time=98.328..104.320 rows=14331 loops=1) Group Key: items.purchase_order_id Buffers: shared hit=3749 -> Seq Scan on items (cost=0.00..5187.03 rows=143803 width=12) (actual time=0.005..38.307 rows=143821 loops=1) Buffers: shared hit=3749 -> Hash (cost=746.74..746.74 rows=1242 width=8) (actual time=1.588..1.588 rows=1242 loops=1) Buckets: 2048 Batches: 1 Memory Usage: 65kB Buffers: shared hit=399 -> Bitmap Heap Scan on purchase_orders po (cost=33.91..746.74 rows=1242 width=8) (actual time=0.200..1.169 rows=1242 loops=1) Recheck Cond: (shop_id = 195) Heap Blocks: exact=392 Buffers: shared hit=399 -> Bitmap Index Scan on index_purchase_orders_on_shop_id (cost=0.00..33.60 rows=1242 width=0) (actual time=0.153..0.153 rows=1258 loops=1) Index Cond: (shop_id = 195) Buffers: shared hit=7 Planning time: 0.200 ms Execution time: 166.665 ms
第二種方法,使用相關子查詢:
SELECT po.id, ( SELECT SUM(quantity) FROM items WHERE purchase_order_id = po.id GROUP BY purchase_order_id ) AS total_quantity FROM purchase_orders po WHERE shop_id = 195 GROUP BY po.id
解釋:
HashAggregate (cost=749.84..25716.43 rows=1242 width=16) (actual time=1.667..9.488 rows=1243 loops=1) Group Key: po.id Buffers: shared hit=5603 -> Bitmap Heap Scan on purchase_orders po (cost=33.91..746.74 rows=1242 width=8) (actual time=0.175..1.072 rows=1243 loops=1) Recheck Cond: (shop_id = 195) Heap Blocks: exact=390 Buffers: shared hit=397 -> Bitmap Index Scan on index_purchase_orders_on_shop_id (cost=0.00..33.60 rows=1242 width=0) (actual time=0.130..0.130 rows=1244 loops=1) Index Cond: (shop_id = 195) Buffers: shared hit=7 SubPlan 1 -> GroupAggregate (cost=0.42..20.09 rows=16 width=16) (actual time=0.005..0.005 rows=1 loops=1243) Group Key: items.purchase_order_id Buffers: shared hit=5206 -> Index Scan using index_items_on_purchase_order_id on items (cost=0.42..19.85 rows=16 width=12) (actual time=0.003..0.004 rows=3 loops=1243) Index Cond: (purchase_order_id = po.id) Buffers: shared hit=5206 Planning time: 0.183 ms Execution time: 9.831 ms
我最近一直在研究這個問題,我的結論是規劃器不夠聰明,無法優化這個特定的東西。即使是大量行,相關的子選擇也會為每一行執行一次,而不相關的子選擇即使只需要幾行,也會執行到完成。
它確實知道一個會比另一個快(假設估計的行數相當正確),但它缺乏辨識這兩個公式相同的能力,因此根據估計的性能在執行計劃之間進行選擇。
儘管在您的情況下,查詢不會相同,因為它們以不同的方式處理“項目”中的缺失行。相關子選擇與左連接相同,而不是內連接。