Window-Functions
計算給定時間的累積金額
我想找出在 P 小時內花費超過 D 美元的客戶。假設 D=10 美元,P=48。然後在下面的交易數據中:
CUSTOMER STAMP AMOUNT -------- ----- ------ A 2021-06-02 00:22:53 1.44 A 2021-06-02 06:24:17 1.51 A 2021-06-03 07:09:45 2.73 A 2021-06-03 15:57:30 3.92 A 2021-06-04 06:41:21 1.83 B 2021-06-01 02:50:22 2.65 B 2021-06-03 07:01:36 4.05 B 2021-06-04 05:20:10 3.30 B 2021-06-04 09:53:53 2.64 B 2021-06-04 14:54:00 2.26 C 2021-06-01 16:01:38 2.61 C 2021-06-01 23:38:25 1.16 C 2021-06-02 14:41:02 2.82 C 2021-06-03 00:28:37 1.54 C 2021-06-03 02:06:46 1.19 C 2021-06-04 17:16:29 2.05
客戶 B 符合標準,客戶 A 和 C 不符合(即使他們的總支出 > 10 美元)。
我知道如何使用視窗函式,但在這種情況下,我事先不知道視窗應該有多大。
我認為答案是這樣的:
select customer , stamp , amount , amount > coalesce(sum(amount) over (order by extract('epoch_second', stamp) range between 2*24*60*60 preceding and current row), 0) as is_alert from t where amount is not null order by stamp
但我的數據庫是雪花,它似乎不支持這種語法。該頁面說:
對於累積窗框: … RANGE 類似於 ROWS,除了它只計算與目前行具有相同值的行的結果(根據指定的 ORDER BY 子句)。
Snowflake 確實支持執行您的要求,但語法不同:
這是腳本:
CREATE SCHEMA IF NOT EXISTS demo; create or replace table demo.customer_transactions( transaction_id integer identity ,customer varchar ,txn_date timestamp_ltz ,amount float ); // //CUSTOMER STAMP AMOUNT //-------- ----- ------ insert into demo.customer_transactions (customer, txn_date, amount) values ('A', '2021-06-02 00:22:53', 1.44) ,('A', '2021-06-02 06:24:17', 1.51) ,('A', '2021-06-03 07:09:45', 2.73) ,('A', '2021-06-03 15:57:30', 3.92) ,('A', '2021-06-04 06:41:21', 1.83) ,('B', '2021-06-01 02:50:22', 2.65) ,('B', '2021-06-03 07:01:36', 4.05) ,('B', '2021-06-04 05:20:10', 3.30) ,('B', '2021-06-04 09:53:53', 2.64) ,('B', '2021-06-04 14:54:00', 2.26) ,('C', '2021-06-01 16:01:38', 2.61) ,('C', '2021-06-01 23:38:25', 1.16) ,('C', '2021-06-02 14:41:02', 2.82) ,('C', '2021-06-03 00:28:37', 1.54) ,('C', '2021-06-03 02:06:46', 1.19) ,('C', '2021-06-04 17:16:29', 2.05) ; select * ,case when window_transaction_amount > 10.00 then 'warn' else 'ok' end as is_alert from ( select a.transaction_id ,a.customer ,a.txn_date ,a.txn_date as window_start_date ,dateadd(hour,48, a.txn_date) as window_end_date ,a.amount //,array_construct(b.transaction_id) as b_transactions ,listagg(distinct b.transaction_id,',') within group (order by b.transaction_id) as window_transaction_ids ,sum(b.amount) as window_transaction_amount from demo.customer_transactions a left outer join demo.customer_transactions b on a.customer = b.customer and b.txn_date between a.txn_date and dateadd(hour,48, a.txn_date) group by a.transaction_id ,a.customer ,a.txn_date ,dateadd(hour,48, a.txn_date) ,a.amount ) q order by 1,2,3;