Postgresql

對於每個類別,使用 PostgreSQL 遞歸 CTE 查找所有子類別中外鍵項的計數

  • June 11, 2021

我在 PostgreSQL 9.4 中有一個典型的樹結構儲存為鄰接列表:

gear_category (
  id INTEGER PRIMARY KEY,
  name TEXT,
  parent_id INTEGER   
);

以及附加到類別的項目列表:

gear_item (
  id INTEGER PRIMARY KEY,
  name TEXT,
  category_id INTEGER REFERENCES gear_category
);

任何類別都可以附加裝備物品,而不僅僅是樹葉。

出於速度原因,我想預先計算每個類別的一些數據,我將使用這些數據來生成物化視圖。

期望的輸出:

speedy_materialized_view (
  gear_category_id INTEGER,
  count_direct_child_items INTEGER,
  count_recursive_child_items INTEGER
);

count_recursive_child_items是附加到目前類別或任何子類別的 GearItem 的累積數量。每個類別應該有一行,任何為 0 的計數都應為 0。

為了計算這個,我們需要使用遞歸 CTE 來遍歷樹:

WITH RECURSIVE children(id, parent_id) AS (
   --base case
   SELECT gear_category.id AS id, gear_category.parent_id AS parent_id
   FROM gear_category
   WHERE gear_category.id = 37  -- setting this to id includes current object
                            -- setting to parent_id includes only children
   --combine with recursive part
   UNION ALL 
   SELECT gear_category.id AS gear_category_id
        , gear_category.parent_id AS gear_category_parent_id
   FROM gear_category, children
   WHERE children.id = gear_category.parent_id
)
TABLE children;

計算附加到此子類別列表的子齒輪項目很簡單:

--Subselect variant
SELECT count(gear_item.id) AS count_recursive_child_items_for_single_cat
FROM gear_item 
WHERE gear_item.category_id IN (
SELECT children.id AS children_id
FROM children);


-- JOIN variant
SELECT count(gear_item.id) AS count_recursive_child_items_for_single_cat
FROM gear_item, children
WHERE gear_item.category_id = children.id;

但是,如果您查看 CTE,我已將起始類別 ID 硬編碼為“37”。我不知道如何組合這些查詢來為所有類別生成 count_recursive_child_items,而不僅僅是一個類別。

我如何結合這些?

此外,目前對於每個類別,我都會計算所有子類別,這會產生很多重複的工作,我不知道如何刪除它。例如,假設我有祖父母 > 父母 > 葉。目前我分別計算祖父母和父母的子類別,這意味著我兩次計算父>葉關係。

而且由於我已經count_direct_child_items為每個類別返回,因此在計算時使用它們可能會更快,count_recursive_child_items而不是像我現在那樣從頭開始計算。

另外,這些概念中的每一個對我來說都是有意義的。我只是不知道如何將它們組合成一個優雅/優化的查詢。

這可以完成工作:

CREATE MATERIALIZED VIEW speedy_materialized_view AS
WITH RECURSIVE tree AS (
  SELECT id, parent_id, ARRAY[id] AS path
  FROM   gear_category
  WHERE  parent_id IS NULL

  UNION ALL 
  SELECT c.id, c.parent_id, path || c.id
  FROM   tree t
  JOIN   gear_category c ON c.parent_id = t.id
  )
, tree_ct AS (
  SELECT t.id, t.path, COALESCE(i.item_ct, 0) AS item_ct
  FROM   tree t
  LEFT   JOIN  (
     SELECT category_id AS id, count(*) AS item_ct
     FROM   gear_item
     GROUP  BY 1
     ) i USING (id)
  )
SELECT t.id
    , t.item_ct       AS count_direct_child_items
    , sum(t1.item_ct) AS count_recursive_child_items 
FROM   tree_ct t
LEFT   JOIN tree_ct t1 ON t1.path[1:array_upper(t.path, 1)] = t.path
GROUP  BY t.id, t.item_ct;

count_recursive_child_items每個類別都是單獨計算的,所以我不相信這是深度樹最快的方法。

但是,遞歸 CTE 中不允許使用聚合函式。

遞歸函式

嚴格來說,它確實是迭代的——但“遞歸”CTE 也是如此。

您可以建構一個使用臨時表的函式。您需要了解 plpgsql 的使用方式,否則需要解釋的內容太多。

CREATE OR REPLACE FUNCTION f_tree_ct()
 RETURNS TABLE (id int, count_direct_child_items int, count_recursive_child_items int)
 LANGUAGE plpgsql AS
$func$
DECLARE
  _lvl int;
BEGIN
  -- basic table with added path and count
  CREATE TEMP TABLE t1 AS
  WITH RECURSIVE tree AS (
     SELECT c.id, c.parent_id, '{}'::int[] AS path, 0 AS lvl
     FROM   gear_category c
     WHERE  c.parent_id IS NULL

     UNION ALL 
     SELECT c.id, c.parent_id, path || c.parent_id, lvl + 1
     FROM   tree t
     JOIN   gear_category c ON c.parent_id = t.id
     )
  , tree_ct AS (
     SELECT t.id, t.parent_id, t.path, t.lvl, COALESCE(i.item_ct, 0) AS item_ct
     FROM   tree t
     LEFT   JOIN  (
        SELECT i.category_id AS id, count(*)::int AS item_ct
        FROM   gear_item i
        GROUP  BY 1
        ) i USING (id)
     )
  TABLE tree_ct;

  -- CREATE INDEX ON t1 (lvl); -- only for very deep trees

  SELECT INTO _lvl max(lvl) FROM t1;  -- identify max lvl to start bottom up

  -- recursively aggregate each level in 2nd temp table
  CREATE TEMP TABLE t2 AS
  SELECT t1.id, t1.parent_id, t1.lvl
       , t1.item_ct
       , t1.item_ct AS sum_ct 
  FROM   t1
  WHERE  t1.lvl = _lvl;

  IF _lvl > 0 THEN
     FOR i IN REVERSE _lvl .. 1 LOOP
        INSERT INTO t2
        SELECT t1.id, t1.parent_id, t1.lvl, t1.item_ct
             , CASE WHEN t2.sum_ct IS NULL THEN t1.item_ct ELSE t1.item_ct + t2.sum_ct END
        FROM   t1
        LEFT   JOIN (
           SELECT t2.parent_id AS id, sum(t2.sum_ct) AS sum_ct
           FROM   t2
           WHERE  t2.lvl = i
           GROUP  BY 1
           ) t2 USING (id)
        WHERE  t1.lvl = i - 1;
     END LOOP;
  END IF;

  RETURN QUERY  -- only requested columns, unsorted
  SELECT t2.id, t2.item_ct, t2.sum_ct FROM t2;

  DROP TABLE t1, t2; -- to allow repeated execution in one transaction
  RETURN;
END
$func$;

CREATE MATERIALIZED VIEW由於使用了臨時表,這不能包含在語句中。您可以使用它創建另一個(臨時)表,充當手動維護的“物化視圖”:

CREATE TABLE speedy_materialized_view AS
SELECT * FROM f_tree_ct();

或者,您可以TRUNCATE speedy_materialized_view在函式中直接寫入。該函式將RETURNS void改為,或者您可以返回一些元資訊,如行數……

db<>fiddle here

sqlfiddle

另外:

CTE 的遞歸項中的列別名僅用於文件,因為輸出列名稱僅由非遞歸項確定。

引用自:https://dba.stackexchange.com/questions/121765