Postgresql

通過基於傳入的數組進行過濾,允許來自 plpgsql 函式的動態結果集?

  • January 9, 2020

我想我在這裡避免了 XY 問題,因為我正在為真正的潛在問題(以動態方式匯總多個表)提出我的解決方案,並且我只詢問我被卡住的最後一個部分。因此,首先有一點背景。我提供了一個最小的範例數據集,以及用於以我描述的方式總結數據的工作程式碼。


考慮如下設置:

create temp table tbl1 (id int primary key, category text, passing boolean);

insert into tbl1 values
(1, 'A', 't'),
(2, 'A', 't'),
(3, 'A', 't'),
(4, 'A', 'f'),
(5, 'B', 't'),
(6, 'B', 'f'),
(7, 'C', 't'),
(8, 'C', 't'),
(9, 'C', 'f'),
(10, 'C', 'f'),
(11, 'C', 'f'),
(12, 'C', 'f'),
(13, 'B', 't'),
(14, 'B', 'f'),
(15, 'B', 't'),
(16, 'B', 'f'),
(17, 'B', 't'),
(18, 'B', 'f'),
(19, 'B', 't'),
(20, 'B', 'f');

然後我可以生成以下摘要:

postgres=> select category, passing, count(*) from tbl1 group by category, passing order by category, passing;
category | passing | count
----------+---------+-------
A        | f       |     1
A        | t       |     3
B        | f       |     5
B        | t       |     5
C        | f       |     4
C        | t       |     2
(6 rows)

但是,我有多個要匯總的此類表(均使用相同的類別 A、B、C),因此我要顯示的最終結果只需一行即可匯總一個表,如下所示:

Table Name | Overall passing rate | A passing rate | B passing rate | C passing rate
------------+----------------------+----------------+----------------+----------------
tbl1       | 50% (10/20)          | 75% (3/4)      | 50% (5/10)     | 33% (2/6)

有時我還需要能夠過濾,例如只返回關於類別 A 和 B 的資訊並忽略 C,如下所示:

Table Name | Overall passing rate | A passing rate | B passing rate
------------+----------------------+----------------+----------------
tbl1       | 57% (8/14)           | 75% (3/4)      | 50% (5/10)

count(*) filter (where...)我可以使用有點笨拙的 CTE 中的語法通過查詢生成上面顯示的第一個輸出,如下所示:

with tallies as (
select
count(*) filter (where category in ('A', 'B', 'C') and passing) as abc_pass,
count(*) filter (where category in ('A', 'B', 'C')) as abc_all,
count(*) filter (where category = 'A' and passing) as a_pass,
count(*) filter (where category = 'A') as a_all,
count(*) filter (where category = 'B' and passing) as b_pass,
count(*) filter (where category = 'B') as b_all,
count(*) filter (where category = 'C' and passing) as c_pass,
count(*) filter (where category = 'C') as c_all
from tbl1
)
select 'tbl1' as "Table Name",
format('%s%% (%s/%s)', 100*abc_pass/abc_all, abc_pass, abc_all) as "Overall passing rate",
format('%s%% (%s/%s)', 100*a_pass/a_all, a_pass, a_all) as "A passing rate",
format('%s%% (%s/%s)', 100*b_pass/b_all, b_pass, b_all) as "B passing rate",
format('%s%% (%s/%s)', 100*c_pass/c_all, c_pass, c_all) as "C passing rate"
from tallies;

我可以毫不費力地修改它以省略類別 C,以生成上面的第二個範例輸出。(這裡沒有顯示,因為它主要是重複的。)

問題是,有這麼多表格要總結(實際上是視圖,不是表格,但這沒關係),並且要求我能夠輕鬆地總結任何一組表格,並隨意包含或省略類別(例如“總結 tbl1、tbl2 和 tbl3,但只總結 B 類和 C 類”,或“只總結所有表的 B 類”)上面的 SQL 不夠靈活。

我可以使用接受任意數量的 type 參數的 plpgsql 函式來完成“臨時匯總任何一組表”的要求name,並將我想要匯總的所有表的名稱提供給它,如下所示:

create function summarize_tables(variadic tbls name[])
returns table ("Table Name" text, "Overall pass rate" text, "A passing rate" text, "B passing rate" text, "C passing rate" text)
language plpgsql
as $funcdef$
declare
 tbl name;
begin
 foreach tbl in array tbls
 loop
   return query execute
     format(
       $query$
         with tallies as (
           select
             count(*) filter (where category in ('A', 'B', 'C') and passing) as abc_pass,
             count(*) filter (where category in ('A', 'B', 'C')) as abc_all,
             count(*) filter (where category = 'A' and passing) as a_pass,
             count(*) filter (where category = 'A') as a_all,
             count(*) filter (where category = 'B' and passing) as b_pass,
             count(*) filter (where category = 'B') as b_all,
             count(*) filter (where category = 'C' and passing) as c_pass,
             count(*) filter (where category = 'C') as c_all
           from %I
         )
         select
           %L as "Table Name",
           format('%%s%%%% (%%s/%%s)', 100*abc_pass/abc_all, abc_pass, abc_all) as "Overall passing rate",
           format('%%s%%%% (%%s/%%s)', 100*a_pass/a_all, a_pass, a_all) as "A passing rate",
           format('%%s%%%% (%%s/%%s)', 100*b_pass/b_all, b_pass, b_all) as "B passing rate",
           format('%%s%%%% (%%s/%%s)', 100*c_pass/c_all, c_pass, c_all) as "C passing rate"
         from tallies;
       $query$,
       tbl,
       tbl
     );
 end loop;
 return;
end
$funcdef$
;

可以呼叫它select * from summarize_tables('tbl1');來總結上面的範例數據集,或者select * from summarize_tables('tbl1', 'tbl2');總結其他表格。

但是,這根本不能滿足第二個要求——我能夠計算不同的結果列以任意包含或排除 A、B 或 C。

我想也許有一種方法可以使用如下所示的函式簽名來做到這一點:

create function summarize_tables(categories text[], variadic tbls name[])

然後這樣稱呼它:

select * from summarize_tables('{A,B}', 'tbl1', 'tbl2');

但我不知道如何在我的 SQL 中使用“類別”數組。這甚至有可能根據傳入的類別以這樣的過濾方式總結結果嗎?


在相關說明中,我找到了https://stackoverflow.com/a/11751557/5419599所以我知道如果我想要返回真正的動態列,我將不得不使用returns setof record 並且我必須指定每次呼叫函式時要返回的列的全名和類型。如果有的話,我會對解決方法感興趣。

可能這兩個因素的組合意味著我應該接受我必須為我想要總結的類別 A、B 和 C 的每個組合有一個單獨的函式 - 總共七個函式。

但在這種情況下,如果以後再添加 D 類和 E 類,那我可悲了!

這種組合的可能性讓我認為每次呼叫函式時都必須指定返回列名和類型是值得的,因為只需要一個函式就需要付出代價。換句話說,returns table (...)將函式定義中的 更改為returns setof record,然後將呼叫更改select * from summarize_tables(...);為:

select * from summarize_tables('{A,C,D}', ...)
as x ("Table Name" text, "Overall pass rate" text, "A passing rate" text, "C passing rate" text, "D passing rate" text)
;

但是,除非有一種方法可以使過濾比目前 CTE 中的過濾更具動態性,否則這種折衷甚至是不可能的——即一種利用categories text[]傳入參數的方法。 就是我的問題。

(不過,也歡迎任何有關上述設計的指針。)

出於這個問題的目的,我省略了對空“傳遞”值的處理,這將通過將“傳遞”更改為“傳遞為真”來處理 - 並且省略了大小寫切換以避免在某些特定情況下除以零錯誤表不包含特定類別。

我想出瞭如何使用unnest(...) with ordinality並返回一個數組作為結果的一部分。

這是函式定義:

create function summarize_tables(categories text[], variadic tbls name[])
returns table (tbl name, overall text, by_category text[])
language plpgsql
as $funcdef$
 begin
   foreach tbl in array tbls
   loop
     return query execute format(
       $query$
         with tallies as (
           select
             category,
             count(*) filter (where passing) as passcount,
             count(*) as allcount from %I group by category
         ),
         categories_passed as (
           select * from unnest(%L::text[]) with ordinality as x(category, rn)
         )
         select
           %1$L::name as tbl,
           format('%%s%%%% (%%s/%%s)', (sum(passcount)*100/sum(allcount))::int, sum(passcount), sum(allcount)) as overall,
           array_agg(format('%%s%%%% (%%s/%%s)', passcount*100/allcount, passcount, allcount) order by rn) as by_category
         from categories_passed natural left join tallies;
       $query$,
       tbl,
       categories
     );
   end loop;
   return;
 end
$funcdef$
;

原始結果(帶有select *)如下所示:

postgres=> select * from summarize_tables('{A,B,C}', 'tbl1');
tbl  |   overall   |              by_category
------+-------------+----------------------------------------
tbl1 | 50% (10/20) | {"75% (3/4)","50% (5/10)","33% (2/6)"}
(1 row)

postgres=> select * from summarize_tables('{A,B}', 'tbl1');
tbl  |  overall   |        by_category
------+------------+----------------------------
tbl1 | 57% (8/14) | {"75% (3/4)","50% (5/10)"}
(1 row)

請注意,按類別細分的結果不是按字母順序排序的,而是按傳入的相同順序保持:

postgres=> select * from summarize_tables('{B,A}', 'tbl1');
tbl  |  overall   |        by_category
------+------------+----------------------------
tbl1 | 57% (8/14) | {"50% (5/10)","75% (3/4)"}
(1 row)

要獲得問題中顯示的確切結果,需要給出列名並將結果從數組中提取出來:

select
 tbl as "Table Name",
 overall as "Overall passing rate",
 by_category[1] as "A passing rate",
 by_category[2] as "B passing rate",
 by_category[3] as "C passing rate"
from summarize_tables('{A,B,C}', 'tbl1');

Table Name | Overall passing rate | A passing rate | B passing rate | C passing rate
------------+----------------------+----------------+----------------+----------------
tbl1       | 50% (10/20)          | 75% (3/4)      | 50% (5/10)     | 33% (2/6)
(1 row)

select
 tbl as "Table Name",
 overall as "Overall passing rate",
 by_category[1] as "A passing rate",
 by_category[2] as "B passing rate"
from summarize_tables('{A,B}', 'tbl1');

Table Name | Overall passing rate | A passing rate | B passing rate
------------+----------------------+----------------+----------------
tbl1       | 57% (8/14)           | 75% (3/4)      | 50% (5/10)
(1 row)

問題中沒有要求,但如果希望函式的呼叫者能夠訪問原始數字(例如,用於排序或數字條件),則可以返回嵌套數組,並且可以由呼叫者完成格式化:

create function summarize_tables(categories text[], variadic tbls name[])
returns table (tbl name, overall numeric[], by_category numeric[][])
language plpgsql
as $funcdef$
 begin
   foreach tbl in array tbls
   loop
     return query execute format(
       $query$
         with tallies as (
           select
             category,
             count(*) filter (where passing)::numeric as passcount,
             count(*)::numeric as allcount from %I group by category
         ),
         categories_passed as (
           select * from unnest(%L::text[]) with ordinality as x(category, rn)
         )
         select
           %1$L::name as tbl,
           array[sum(passcount), sum(allcount)] as overall,
           array_agg(array[passcount, allcount] order by rn) as by_category
         from categories_passed natural left join tallies;
       $query$,
       tbl,
       categories
     );
   end loop;
   return;
 end
$funcdef$
;

從嵌套數組中提取值的 SQL 並不漂亮,但它可以工作:

with x as (
 select
   tbl,
   overall as o,
   by_category[1:1] as a,
   by_category[2:2] as b,
   by_category[3:3] as c
 from summarize_tables('{A,B,C}', 'tbl1')
)
select
 tbl,
 format('%s%% (%s/%s)', (100*x.o[1]/x.o[2])::int, x.o[1], x.o[2]) as "Overall passing rate",
 format('%s%% (%s/%s)', (100*x.a[1][1]/x.a[1][2])::int, x.a[1][1], x.a[1][2]) as "A passing rate",
 format('%s%% (%s/%s)', (100*x.b[1][1]/x.b[1][2])::int, x.b[1][1], x.b[1][2]) as "B passing rate",
 format('%s%% (%s/%s)', (100*x.c[1][1]/x.c[1][2])::int, x.c[1][1], x.c[1][2]) as "C passing rate"
from x
;

有關 Postgres 數組如何工作的相關閱讀:https ://stackoverflow.com/a/34408253/5419599

你很好地描述了這個問題。我不能給出一個完整的解決方案,但我會給你一些想法。

正如您所發現的,結果列數可變的函式無法按照您想要的方式完成;如果將其定義為RETURNS SETOF record,則必須在對函式的每個查詢中指定實際結果列。這是因為在查詢解析時必須知道這些列。

您將必須編寫一個基於categoriesandtbls參數組成查詢字元串的函式;這稱為動態 SQL。無論您是在客戶端用 PL/pgSQL 還是其他語言編寫該函式都無關緊要 - 使用最適合您任務的語言。

然後,在第二步中,針對數據庫執行生成的查詢。

引用自:https://dba.stackexchange.com/questions/256978