Postgresql

如何使用 PostgreSQL CTE 遞歸獲得性能?

  • March 23, 2018

我在同一個表中使用 id 和 parent_id 做了一個樹結構。對於查詢,我使用的是 PostgreSQL 提供的 CTE,但是要花費大量時間來執行遞歸結果的連接。例如,當我在 sadt_lot 表上有 100 條記錄時,這個查詢需要 8 秒才能返回結果。有人有更好的主意嗎?

範例:列出按 root sadt_lot’s 分組的所有 sadt’s

EXPLAIN ANALYZE WITH RECURSIVE downlots as (
       SELECT sl1.sadt_lot_id, 0 AS level, sl1.sadt_lot_id as root_id
       FROM sadt_lot sl1
       WHERE sl1.parent_lot_id IS NULL
   UNION
       SELECT sl2.sadt_lot_id, d.level + 1, d.sadt_lot_id as root_id
       FROM sadt_lot sl2
       INNER JOIN downlots d ON d.sadt_lot_id = sl2.parent_lot_id
)
SELECT sl.sadt_lot_id, array_agg(s.sadt_id)
FROM sadt_lot sl
LEFT JOIN sadt s ON s.sadt_lot_id = any(SELECT sadt_lot_id FROM downlots WHERE root_id = sl.sadt_lot_id)
WHERE sl.parent_lot_id IS NULL 
group by sl.sadt_lot_id
ORDEr By sl.sadt_lot_id

查詢計劃

GroupAggregate  (cost=42.53..15077.74 rows=1 width=36) (actual time=104.090..8436.505 rows=90 loops=1)
 Group Key: sl.sadt_lot_id
 CTE downlots
   ->  Recursive Union  (cost=0.00..42.39 rows=101 width=12) (actual time=0.006..0.104 rows=95 loops=1)
         ->  Seq Scan on sadt_lot sl1  (cost=0.00..2.94 rows=1 width=12) (actual time=0.005..0.019 rows=90 loops=1)
               Filter: (parent_lot_id IS NULL)
               Rows Removed by Filter: 5
         ->  Hash Join  (cost=0.33..3.74 rows=10 width=12) (actual time=0.027..0.028 rows=2 loops=2)
               Hash Cond: (sl2.parent_lot_id = d.sadt_lot_id)
               ->  Seq Scan on sadt_lot sl2  (cost=0.00..2.94 rows=94 width=8) (actual time=0.002..0.008 rows=95 loops=2)
               ->  Hash  (cost=0.20..0.20 rows=10 width=8) (actual time=0.010..0.010 rows=48 loops=2)
                     Buckets: 1024  Batches: 1  Memory Usage: 9kB
                     ->  WorkTable Scan on downlots d  (cost=0.00..0.20 rows=10 width=8) (actual time=0.001..0.004 rows=48 loops=2)
 ->  Nested Loop Left Join  (cost=0.14..15004.14 rows=6242 width=8) (actual time=8.234..8434.229 rows=11345 loops=1)
       Join Filter: (SubPlan 2)
       Rows Removed by Join Filter: 1112125
       ->  Index Only Scan using sadt_lot_sadt_lot_id_parent_lot_id_idx on sadt_lot sl  (cost=0.14..12.86 rows=1 width=4) (actual time=0.011..0.252 rows=90 loops=1)
             Index Cond: (parent_lot_id IS NULL)
             Heap Fetches: 90
       ->  Seq Scan on sadt s  (cost=0.00..635.83 rows=12483 width=8) (actual time=0.002..1.785 rows=12483 loops=90)
       SubPlan 2
         ->  CTE Scan on downlots  (cost=0.00..2.27 rows=1 width=4) (actual time=0.003..0.007 rows=1 loops=1123470)
               Filter: (root_id = sl.sadt_lot_id)
               Rows Removed by Filter: 94
Planning time: 0.203 ms
Execution time: 8436.598 ms

我找到了解決方案。我正在使用遞歸表達式如何加入參數,它在加入時使用的表上做了幾個循環,更好的方法是在加入這個表(sadt)之前,用遞歸表達式(downlots“table”)和之後進行加入,使用結果,與sadt連接,查詢從8秒跳到8毫秒。遵循解決方案:

EXPLAIN ANALYZE SELECT sl.sadt_lot_id, array_agg(s.sadt_id)
FROM sadt_lot sl
LEFT JOIN (WITH RECURSIVE downlots as (
       SELECT sl1.sadt_lot_id, 0 AS level, sl1.sadt_lot_id as root_id
       FROM sadt_lot sl1
       WHERE sl1.parent_lot_id IS NULL
   UNION
       SELECT sl2.sadt_lot_id, d.level + 1, d.sadt_lot_id as root_id
       FROM sadt_lot sl2
       INNER JOIN downlots d ON d.sadt_lot_id = sl2.parent_lot_id
)SELECT * FROM downlots) d ON d.sadt_lot_id = sl.sadt_lot_id
LEFT JOIN sadt s ON s.sadt_lot_id = d.root_id
WHERE sl.parent_lot_id IS NULL 
group by sl.sadt_lot_id
ORDEr By sl.sadt_lot_id

查詢計劃

Sort  (cost=1935.35..1935.56 rows=82 width=36) (actual time=8.230..8.234 rows=82 loops=1)
Sort Key: sl.sadt_lot_id
Sort Method: quicksort  Memory: 75kB
->  HashAggregate  (cost=1931.72..1932.74 rows=82 width=36) (actual time=8.085..8.197 rows=82 loops=1)
   Group Key: sl.sadt_lot_id
   ->  Hash Right Join  (cost=469.73..1839.25 rows=18493 width=8) (actual time=0.328..6.273 rows=10742 loops=1)
         Hash Cond: (s.sadt_lot_id = downlots.root_id)
         ->  Seq Scan on sadt s  (cost=0.00..645.78 rows=12678 width=8) (actual time=0.007..1.406 rows=12493 loops=1)
         ->  Hash  (cost=465.72..465.72 rows=321 width=8) (actual time=0.242..0.242 rows=82 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 12kB
               ->  Hash Right Join  (cost=432.42..465.72 rows=321 width=8) (actual time=0.049..0.232 rows=82 loops=1)
                     Hash Cond: (downlots.sadt_lot_id = sl.sadt_lot_id)
                     ->  CTE Scan on downlots  (cost=428.41..444.05 rows=782 width=12) (actual time=0.007..0.167 rows=96 loops=1)
                           CTE downlots
                             ->  Recursive Union  (cost=0.00..428.41 rows=782 width=12) (actual time=0.006..0.143 rows=96 loops=1)
                                   ->  Seq Scan on sadt_lot sl1  (cost=0.00..2.99 rows=82 width=12) (actual time=0.004..0.018 rows=82 loops=1)
                                         Filter: (parent_lot_id IS NULL)
                                         Rows Removed by Filter: 14
                                   ->  Hash Join  (cost=4.23..40.98 rows=70 width=12) (actual time=0.030..0.031 rows=5 loops=3)
                                         Hash Cond: (d.sadt_lot_id = sl2.parent_lot_id)
                                         ->  WorkTable Scan on downlots d  (cost=0.00..16.40 rows=820 width=8) (actual time=0.000..0.002 rows=32 loops=3)
                                         ->  Hash  (cost=2.99..2.99 rows=99 width=8) (actual time=0.069..0.069 rows=14 loops=1)
                                               Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                               ->  Seq Scan on sadt_lot sl2  (cost=0.00..2.99 rows=99 width=8) (actual time=0.004..0.061 rows=96 loops=1)
                     ->  Hash  (cost=2.99..2.99 rows=82 width=4) (actual time=0.039..0.039 rows=82 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 11kB
                           ->  Seq Scan on sadt_lot sl  (cost=0.00..2.99 rows=82 width=4) (actual time=0.014..0.028 rows=82 loops=1)
                                 Filter: (parent_lot_id IS NULL)
                                 Rows Removed by Filter: 14
Planning time: 0.225 ms
Execution time: 8.300 ms

引用自:https://dba.stackexchange.com/questions/201788