使用 GROUP BY day 生成多個執行總計

April 28, 2020

我有一組使用者購買股票的交易，我想隨著時間的推移跟踪每隻股票的執行餘額。我正在使用視窗函式來跟踪執行餘額，但由於某種原因，我無法讓GROUP BY該查詢的部分工作。
即使我嘗試按日期 ( created_at) 分組，結果集中仍然有重複的天數。下面的範例：
select
 t.customer_id,
 t.created_at::date,
 sum(case when t.stock_ticker = 'tsla' then t.amount end) over (order by t.created_at::date rows unbounded preceding) as tsla_running_amount,
 sum(case when t.stock_ticker = 'goog' then t.amount end) over (order by t.created_at::date rows unbounded preceding) as goog_running_amount,
from transactions t
group by t.created_at, t.customer_id, t.stock_ticker, t.amount
order by t.created_at desc;
測試設置：
CREATE TABLE transactions (
  transaction_id varchar(255) NOT NULL,
  amount float8 NOT NULL,
  stock_ticker varchar(255) NOT NULL,
  transaction_type varchar(255) NOT NULL,
  customer_id varchar NOT NULL,
  inserted_at timestamp NOT NULL,
  created_at timestamp NOT NULL,
  CONSTRAINT transactions_pkey PRIMARY KEY (transaction_id)
);

INSERT INTO transactions(transaction_id, amount, stock_ticker, transaction_type, customer_id, inserted_at, created_at)
VALUES
 ('123123abmk12', 10, 'tsla', 'purchase', 'a1b2c3', '2020-04-01 01:00:00', '2020-04-01 01:00:00')
, ('123123abmk13', 20, 'tsla', 'purchase', 'a1b2c3', '2020-04-03 01:00:00', '2020-04-03 01:00:00')
, ('123123abmk14',  5, 'goog', 'purchase', 'a1b2c3', '2020-04-01 01:00:00', '2020-04-01 01:00:00')
, ('123123abmk15',  8, 'goog', 'purchase', 'a1b2c3', '2020-04-03 01:00:00', '2020-04-03 01:00:00');

CREATE INDEX ix_transactions_customer_id ON transactions USING btree (customer_id);
當我希望將它們全部歸為一天時，這裡的結果總是每天返回多行。
在做了一些研究之後，我也嘗試在子句中進行轉換，但我收到了這個錯誤created_at：date``GROUP BY
Column t.created_at must appear in the GROUP BY clause or be used in an aggregate function
此外，結果只會顯示使用者發生交易的天數。即使使用者當天沒有進行交易，我也需要能夠在時間序列（1 年）中為每一天顯示一行。（改為使用該行上最近的執行餘額。）
我認為這generate_series()是要走的路，但我無法理解如何適應它。

我看到了幾個問題。這應該這樣做：
SELECT * -- ⑥
FROM   (  -- ①
  SELECT the_day::date
  FROM   generate_series(timestamp '2020-01-01', date_trunc('day', localtimestamp), interval '1 day') the_day
  ) d 
LEFT   JOIN ( -- ②
  SELECT customer_id
       , created_at::date AS the_day -- ⑥
       , sum(sum(t.amount) FILTER (WHERE stock_ticker = 'tsla')) OVER w AS tsla_running_amount -- ③
       , sum(sum(t.amount) FILTER (WHERE stock_ticker = 'goog')) OVER w AS goog_running_amount
  FROM   transactions t
  WHERE  created_at &gt;= timestamp '2020-01-01'  -- ④
  GROUP  BY customer_id, created_at::date  -- ⑤
  WINDOW w AS (PARTITION BY customer_id ORDER BY created_at::date) -- ③
  ) t USING (the_day) -- ⑥
ORDER  BY customer_id, the_day; -- ⑦
db<>在這裡擺弄
① 以優化方式生成當年的所有天數。看：
在 PostgreSQL 中生成兩個日期之間的時間序列
②LEFT JOIN你想到的。您可能真的希望每個(customer_id, day). 看：
填寫組內缺少的日期
③ 同時你GROUP BY customer_id的窗框也需要帶PARTITION BY customer_id。
放下rows unbounded preceding。預設的視窗框架應該沒問題。手冊：
預設框架選項是RANGE UNBOUNDED PRECEDING，與相同RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW。
使用聚合FILTER子句而不是CASE表達式。更短、更快、更乾淨。看：
在單個 SELECT 語句中返回多個範圍的計數
最後將聚合結果包裝在另一個sum()視窗函式中。看：
獲取連接表列的不同總和
單獨的WINDOW子句避免重複拼寫相同的內容。對性能沒有影響。
④WHERE邏輯上是冗餘的，但假設也有較舊的行，它會提前排除不相關的行，從而提高性能。
假設沒有或幾乎沒有未來的時間戳，所以沒有上限。
⑤ 你不能t.stock_ticker, t.amount在GROUP BY你想要的每一行(customer_id, the_day)。
⑥ 使用相同的列別名（the_day在我的範例中）以允許子句USING (the_day)中的簡單，並且在頂層。JOIN``SELECT *
⑦ 不確定最終的排序順序。這對我來說似乎更有用。根據您的喜好進行調整。

引用自：https://dba.stackexchange.com/questions/265956

使用 GROUP BY day 生成多個執行總計

相關問答

當時間序列中的值為空時返回上一個執行總計

如何優化我的事務級別執行餘額笛卡爾連接？

使用 group by 在多個表上選擇

形成具有相同值的連續行組

將兩個事件表合併到一個時間線中

基於時間的採樣