尋找遞歸查詢的更簡單替代方案

April 20, 2016

實際查詢涉及更多，但我面臨的問題可以提煉為：
過濾單調遞增整數的行集的查詢，以便 -在最終結果集中， row(n+1).value >= row(n).value + 5。
對於我需要解決的實際問題，行集計數在 1000 秒以內。
舉幾個例子來說明：
如果行是：1,2,3,4,5：那麼查詢應該返回：1
如果行是：1,5,7,10,11,12,13：那麼查詢應該返回：1,7,12
如果行是：6,8,11,16,20,23：那麼查詢應該返回：6,11,16,23
如果行是：6,8,12,16,20,23：那麼查詢應該返回：6,12,20
我已經設法通過以下查詢獲得所需的結果，但它似乎過於復雜。取消註釋不同的“..with t(k)..”以試用它們。
我正在尋找任何簡化或替代方法來獲得相同的結果。
with recursive r(n, pri) as (
   with t(k) as (values (1),(2),(3),(4),(5))   -- the data we want to filter
   -- with t(k) as (values (1),(5),(7),(10),(11),(12),(13))
   -- with t(k) as (values (6),(8),(11),(16),(20),(23))
   -- with t(k) as (values (6),(8),(12),(16),(20),(23))
   select min(k), 1::bigint from t             -- bootstrap for recursive processing. 1 here represents rank().
   UNION
   select k, (rank() over(order by k)) rr      -- rank() is required just to filter out the rows we dont want from the final result set, and no other reason
   from r, t 
   where t.k &gt;= r.n+5 and r.pri = 1            -- capture the rows we want, AND unfortunately a bunch of rows we dont want 
)
select n from r where pri = 1; 

這很難！我不知道這是否更簡單，但至少它不使用視窗函式也不產生需要被過濾掉的行。

with recursive r(k, n) as (
   with t(k) as (values (1),(2),(3),(4),(5))   -- the data we want to filter
   -- with t(k) as (values (1),(5),(7),(10),(11),(12),(13))
   -- with t(k) as (values (6),(8),(11),(16),(20),(23))
   -- with t(k) as (values (6),(8),(12),(16),(20),(23))
        ,t2(k,n) AS (select k, (select min(k) from t tt where k &gt;= t.k+5) from t) -- precalculate what's next
   select * from (select * from t2 limit 1) x   -- limit 1 directly fails in a union!
   UNION ALL
   select t2.* from r, t2 where t2.k = r.n      -- on each iteration, keep only the value that matches the previous precalculated next one
)
select k from r

測試

對於非常小的集合，這種替代方法似乎效率較低，但在性能上或多或少是線性的，而原始方法似乎成倍地緩慢。

drop table if exists t;
create temp table t(k) AS
with recursive r(n) as (
 select floor(random()*10)::int + 1
 UNION ALL
 select n + floor(random()*10)::int + 1
 from r
 where n &lt; 100000)        -- change to increase or reduce set
select * from r;           -- surprisingly fast! Go PG!
create index on t(k);

with recursive r(n, pri) as (
   select min(k), 1::bigint from t
   UNION
   select k, (rank() over(order by k)) rr
   from r, t 
   where t.k &gt;= r.n+5 and r.pri = 1
)
select count(*) from r where pri = 1; -- I aborted it after waiting for a minute

with recursive r(k, n) as (
   with t2(k,n) AS (select k, (select min(k) from t tt where k &gt;= t.k+5) from t)
   select * from (select * from t2 limit 1) x
   UNION ALL
   select t2.* from r, t2 where t2.k = r.n
)
select count(*) from r -- 26" in my server

通常，當您需要訪問上一行的數據時，您可以使用視窗聚合函式，但這里目前行的計算是基於上一個結果，而不是行。對於這種問題，我從未找到具有可接受性能的基於集合的解決方案。
但是使用逐行邏輯很容易編寫。我更喜歡ROW_NUMBER在臨時表中實現 a，然後找到下一行很簡單n+1：
CREATE TABLE temp AS
( SELECT k, 
     ROW_NUMBER() OVER (ORDER BY k) AS rn 
  FROM t
)
;
-- I don't know enough about PostgreSQL, but this is probably needed for performance
create unique index on temp(rn)
;

WITH RECURSIVE cte(k,rn, k1) AS 
(
  SELECT k, rn, k AS k1
  FROM tmp 
  WHERE rn = 1  -- start with the minimum value
  UNION ALL
  SELECT tmp.k, tmp.rn,      
     CASE 
       WHEN tmp.k &gt;= cte.k1 + 5 
       THEN tmp.k   -- use the new value
       ELSE cte.k1  -- keep the old value
     END 
  FROM tmp JOIN cte
    ON tmp.rn = cte.rn+1 -- next row
)
SELECT count(k) 
FROM cte
WHERE k = k1
這執行得非常快，我劫持了@ZiggyCrueltyfreeZeitgeister 的小提琴。
事實上，這是老式的游標邏輯處理，游標實際上可能更快（您不需要臨時表加ROW_NUMBER和遞歸）。我的 PostgreSQL 知識非常有限，我不知道在 PG-land 中首選什麼 :)

引用自：https://dba.stackexchange.com/questions/134723

尋找遞歸查詢的更簡單替代方案

相關問答

在某個時間點查找 n 個加油站的目前價格

PostgreSQL 遞歸公用表表達式的 (Big O) 計算複雜度是多少？

基於函式值的遞歸 CTE 在 Postgres 12 上明顯比在 11 上慢

可以使用第一個 CTE 作為第二個 CTE 的過濾器嗎？

使用來自許多表的列提高 order by 的性能

PostgreSQL 規劃器對分佈不均勻的列的錯誤決策