Postgresql
如何使用 PostgreSQL CTE 遞歸獲得性能?
我在同一個表中使用 id 和 parent_id 做了一個樹結構。對於查詢,我使用的是 PostgreSQL 提供的 CTE,但是要花費大量時間來執行遞歸結果的連接。例如,當我在 sadt_lot 表上有 100 條記錄時,這個查詢需要 8 秒才能返回結果。有人有更好的主意嗎?
範例:列出按 root sadt_lot’s 分組的所有 sadt’s
EXPLAIN ANALYZE WITH RECURSIVE downlots as ( SELECT sl1.sadt_lot_id, 0 AS level, sl1.sadt_lot_id as root_id FROM sadt_lot sl1 WHERE sl1.parent_lot_id IS NULL UNION SELECT sl2.sadt_lot_id, d.level + 1, d.sadt_lot_id as root_id FROM sadt_lot sl2 INNER JOIN downlots d ON d.sadt_lot_id = sl2.parent_lot_id ) SELECT sl.sadt_lot_id, array_agg(s.sadt_id) FROM sadt_lot sl LEFT JOIN sadt s ON s.sadt_lot_id = any(SELECT sadt_lot_id FROM downlots WHERE root_id = sl.sadt_lot_id) WHERE sl.parent_lot_id IS NULL group by sl.sadt_lot_id ORDEr By sl.sadt_lot_id
查詢計劃
GroupAggregate (cost=42.53..15077.74 rows=1 width=36) (actual time=104.090..8436.505 rows=90 loops=1) Group Key: sl.sadt_lot_id CTE downlots -> Recursive Union (cost=0.00..42.39 rows=101 width=12) (actual time=0.006..0.104 rows=95 loops=1) -> Seq Scan on sadt_lot sl1 (cost=0.00..2.94 rows=1 width=12) (actual time=0.005..0.019 rows=90 loops=1) Filter: (parent_lot_id IS NULL) Rows Removed by Filter: 5 -> Hash Join (cost=0.33..3.74 rows=10 width=12) (actual time=0.027..0.028 rows=2 loops=2) Hash Cond: (sl2.parent_lot_id = d.sadt_lot_id) -> Seq Scan on sadt_lot sl2 (cost=0.00..2.94 rows=94 width=8) (actual time=0.002..0.008 rows=95 loops=2) -> Hash (cost=0.20..0.20 rows=10 width=8) (actual time=0.010..0.010 rows=48 loops=2) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> WorkTable Scan on downlots d (cost=0.00..0.20 rows=10 width=8) (actual time=0.001..0.004 rows=48 loops=2) -> Nested Loop Left Join (cost=0.14..15004.14 rows=6242 width=8) (actual time=8.234..8434.229 rows=11345 loops=1) Join Filter: (SubPlan 2) Rows Removed by Join Filter: 1112125 -> Index Only Scan using sadt_lot_sadt_lot_id_parent_lot_id_idx on sadt_lot sl (cost=0.14..12.86 rows=1 width=4) (actual time=0.011..0.252 rows=90 loops=1) Index Cond: (parent_lot_id IS NULL) Heap Fetches: 90 -> Seq Scan on sadt s (cost=0.00..635.83 rows=12483 width=8) (actual time=0.002..1.785 rows=12483 loops=90) SubPlan 2 -> CTE Scan on downlots (cost=0.00..2.27 rows=1 width=4) (actual time=0.003..0.007 rows=1 loops=1123470) Filter: (root_id = sl.sadt_lot_id) Rows Removed by Filter: 94 Planning time: 0.203 ms Execution time: 8436.598 ms
我找到了解決方案。我正在使用遞歸表達式如何加入參數,它在加入時使用的表上做了幾個循環,更好的方法是在加入這個表(sadt)之前,用遞歸表達式(downlots“table”)和之後進行加入,使用結果,與sadt連接,查詢從8秒跳到8毫秒。遵循解決方案:
EXPLAIN ANALYZE SELECT sl.sadt_lot_id, array_agg(s.sadt_id) FROM sadt_lot sl LEFT JOIN (WITH RECURSIVE downlots as ( SELECT sl1.sadt_lot_id, 0 AS level, sl1.sadt_lot_id as root_id FROM sadt_lot sl1 WHERE sl1.parent_lot_id IS NULL UNION SELECT sl2.sadt_lot_id, d.level + 1, d.sadt_lot_id as root_id FROM sadt_lot sl2 INNER JOIN downlots d ON d.sadt_lot_id = sl2.parent_lot_id )SELECT * FROM downlots) d ON d.sadt_lot_id = sl.sadt_lot_id LEFT JOIN sadt s ON s.sadt_lot_id = d.root_id WHERE sl.parent_lot_id IS NULL group by sl.sadt_lot_id ORDEr By sl.sadt_lot_id
查詢計劃
Sort (cost=1935.35..1935.56 rows=82 width=36) (actual time=8.230..8.234 rows=82 loops=1) Sort Key: sl.sadt_lot_id Sort Method: quicksort Memory: 75kB -> HashAggregate (cost=1931.72..1932.74 rows=82 width=36) (actual time=8.085..8.197 rows=82 loops=1) Group Key: sl.sadt_lot_id -> Hash Right Join (cost=469.73..1839.25 rows=18493 width=8) (actual time=0.328..6.273 rows=10742 loops=1) Hash Cond: (s.sadt_lot_id = downlots.root_id) -> Seq Scan on sadt s (cost=0.00..645.78 rows=12678 width=8) (actual time=0.007..1.406 rows=12493 loops=1) -> Hash (cost=465.72..465.72 rows=321 width=8) (actual time=0.242..0.242 rows=82 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 12kB -> Hash Right Join (cost=432.42..465.72 rows=321 width=8) (actual time=0.049..0.232 rows=82 loops=1) Hash Cond: (downlots.sadt_lot_id = sl.sadt_lot_id) -> CTE Scan on downlots (cost=428.41..444.05 rows=782 width=12) (actual time=0.007..0.167 rows=96 loops=1) CTE downlots -> Recursive Union (cost=0.00..428.41 rows=782 width=12) (actual time=0.006..0.143 rows=96 loops=1) -> Seq Scan on sadt_lot sl1 (cost=0.00..2.99 rows=82 width=12) (actual time=0.004..0.018 rows=82 loops=1) Filter: (parent_lot_id IS NULL) Rows Removed by Filter: 14 -> Hash Join (cost=4.23..40.98 rows=70 width=12) (actual time=0.030..0.031 rows=5 loops=3) Hash Cond: (d.sadt_lot_id = sl2.parent_lot_id) -> WorkTable Scan on downlots d (cost=0.00..16.40 rows=820 width=8) (actual time=0.000..0.002 rows=32 loops=3) -> Hash (cost=2.99..2.99 rows=99 width=8) (actual time=0.069..0.069 rows=14 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on sadt_lot sl2 (cost=0.00..2.99 rows=99 width=8) (actual time=0.004..0.061 rows=96 loops=1) -> Hash (cost=2.99..2.99 rows=82 width=4) (actual time=0.039..0.039 rows=82 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 11kB -> Seq Scan on sadt_lot sl (cost=0.00..2.99 rows=82 width=4) (actual time=0.014..0.028 rows=82 loops=1) Filter: (parent_lot_id IS NULL) Rows Removed by Filter: 14 Planning time: 0.225 ms Execution time: 8.300 ms