Postgresql
尋找遞歸查詢的更簡單替代方案
實際查詢涉及更多,但我面臨的問題可以提煉為:
過濾單調遞增整數的行集的查詢,以便 -在最終結果集中, row(n+1).value >= row(n).value + 5。
對於我需要解決的實際問題,行集計數在 1000 秒以內。
舉幾個例子來說明:
- 如果行是:1,2,3,4,5:那麼查詢應該返回:1
- 如果行是:1,5,7,10,11,12,13:那麼查詢應該返回:1,7,12
- 如果行是:6,8,11,16,20,23:那麼查詢應該返回:6,11,16,23
- 如果行是:6,8,12,16,20,23:那麼查詢應該返回:6,12,20
我已經設法通過以下查詢獲得所需的結果,但它似乎過於復雜。取消註釋不同的“..with t(k)..”以試用它們。
我正在尋找任何簡化或替代方法來獲得相同的結果。
with recursive r(n, pri) as ( with t(k) as (values (1),(2),(3),(4),(5)) -- the data we want to filter -- with t(k) as (values (1),(5),(7),(10),(11),(12),(13)) -- with t(k) as (values (6),(8),(11),(16),(20),(23)) -- with t(k) as (values (6),(8),(12),(16),(20),(23)) select min(k), 1::bigint from t -- bootstrap for recursive processing. 1 here represents rank(). UNION select k, (rank() over(order by k)) rr -- rank() is required just to filter out the rows we dont want from the final result set, and no other reason from r, t where t.k >= r.n+5 and r.pri = 1 -- capture the rows we want, AND unfortunately a bunch of rows we dont want ) select n from r where pri = 1;
這很難!我不知道這是否更簡單,但至少它不使用視窗函式也不產生需要被過濾掉的行。
with recursive r(k, n) as ( with t(k) as (values (1),(2),(3),(4),(5)) -- the data we want to filter -- with t(k) as (values (1),(5),(7),(10),(11),(12),(13)) -- with t(k) as (values (6),(8),(11),(16),(20),(23)) -- with t(k) as (values (6),(8),(12),(16),(20),(23)) ,t2(k,n) AS (select k, (select min(k) from t tt where k >= t.k+5) from t) -- precalculate what's next select * from (select * from t2 limit 1) x -- limit 1 directly fails in a union! UNION ALL select t2.* from r, t2 where t2.k = r.n -- on each iteration, keep only the value that matches the previous precalculated next one ) select k from r
測試
對於非常小的集合,這種替代方法似乎效率較低,但在性能上或多或少是線性的,而原始方法似乎成倍地緩慢。
drop table if exists t; create temp table t(k) AS with recursive r(n) as ( select floor(random()*10)::int + 1 UNION ALL select n + floor(random()*10)::int + 1 from r where n < 100000) -- change to increase or reduce set select * from r; -- surprisingly fast! Go PG! create index on t(k); with recursive r(n, pri) as ( select min(k), 1::bigint from t UNION select k, (rank() over(order by k)) rr from r, t where t.k >= r.n+5 and r.pri = 1 ) select count(*) from r where pri = 1; -- I aborted it after waiting for a minute with recursive r(k, n) as ( with t2(k,n) AS (select k, (select min(k) from t tt where k >= t.k+5) from t) select * from (select * from t2 limit 1) x UNION ALL select t2.* from r, t2 where t2.k = r.n ) select count(*) from r -- 26" in my server
通常,當您需要訪問上一行的數據時,您可以使用視窗聚合函式,但這里目前行的計算是基於上一個結果,而不是行。對於這種問題,我從未找到具有可接受性能的基於集合的解決方案。
但是使用逐行邏輯很容易編寫。我更喜歡
ROW_NUMBER
在臨時表中實現 a,然後找到下一行很簡單n+1
:CREATE TABLE temp AS ( SELECT k, ROW_NUMBER() OVER (ORDER BY k) AS rn FROM t ) ; -- I don't know enough about PostgreSQL, but this is probably needed for performance create unique index on temp(rn) ; WITH RECURSIVE cte(k,rn, k1) AS ( SELECT k, rn, k AS k1 FROM tmp WHERE rn = 1 -- start with the minimum value UNION ALL SELECT tmp.k, tmp.rn, CASE WHEN tmp.k >= cte.k1 + 5 THEN tmp.k -- use the new value ELSE cte.k1 -- keep the old value END FROM tmp JOIN cte ON tmp.rn = cte.rn+1 -- next row ) SELECT count(k) FROM cte WHERE k = k1
這執行得非常快,我劫持了@ZiggyCrueltyfreeZeitgeister 的小提琴。
事實上,這是老式的游標邏輯處理,游標實際上可能更快(您不需要臨時表加
ROW_NUMBER
和遞歸)。我的 PostgreSQL 知識非常有限,我不知道在 PG-land 中首選什麼 :)