目前行日期過去 12 個月的總銷售額
我需要根據給定月份的每一行計算給定 client_id 過去 12 個月的銷售額總和。
這是按客戶逐月匯總銷售額的初始表(在此處針對特定客戶進行過濾
511656A75
):CREATE TEMP TABLE foo AS SELECT idclient, month_transac, sales FROM ( VALUES ( '511656A75', '2010-06-01', 68.57 ), ( '511656A75', '2010-07-01', 88.63 ), ( '511656A75', '2010-08-01', 94.91 ), ( '511656A75', '2010-09-01', 70.66 ), ( '511656A75', '2010-10-01', 28.84 ), ( '511656A75', '2015-10-01', 85.00 ), ( '511656A75', '2015-12-01', 114.42 ), ( '511656A75', '2016-01-01', 137.08 ), ( '511656A75', '2016-03-01', 172.92 ), ( '511656A75', '2016-04-01', 125.00 ), ( '511656A75', '2016-05-01', 127.08 ), ( '511656A75', '2016-06-01', 104.17 ), ( '511656A75', '2016-07-01', 98.22 ), ( '511656A75', '2016-08-01', 37.08 ), ( '511656A75', '2016-10-01', 108.33 ), ( '511656A75', '2016-11-01', 104.17 ), ( '511656A75', '2017-01-01', 201.67 ) ) AS t(idclient, month_transac, sales);
請注意,有些月份沒有任何銷售(沒有行),所以我想我不能使用
WINDOW
函式(例如前面的 12 行)。使用這個很好的答案來解決類似的問題(Rolling sum / count / average over date interval)我已經完成了這個查詢:
SELECT t1.idclient , t1.month_transac , t1.sales , SUM(t2.sales) as sales_ttm FROM temp_sales_sample_month_aggr t1 LEFT JOIN temp_sales_sample_month_aggr t2 USING (idclient) WHERE t1.idclient = '511656A75' -- for example only AND t2.month_transac >= (t1.month_transac - interval '12 months') AND t2.month_transac < t1.month_transac GROUP BY 1, 2, 3 ORDER BY 2 ;
結果還可以:
sales_ttm
是過去 12 個月的銷售額總和,不包括該行月份的銷售額(即 2017 年 1 月的最後一行總和 2016 年的所有銷售額)。idclient | month_transac | sales | sales_ttm -----------+---------------+--------+--------- 511656A75 | 2010-07-01 | 88.63 | 68.57 511656A75 | 2010-08-01 | 94.91 | 157.20 [...] 511656A75 | 2015-12-01 | 114.42 | 824.83 511656A75 | 2016-01-01 | 137.08 | 892.17 511656A75 | 2016-03-01 | 172.92 | 752.75 511656A75 | 2016-04-01 | 125.00 | 925.67 511656A75 | 2016-05-01 | 127.08 | 1028.17 511656A75 | 2016-06-01 | 104.17 | 1155.25 511656A75 | 2016-07-01 | 98.22 | 1073.59 511656A75 | 2016-08-01 | 37.08 | 1171.81 511656A75 | 2016-10-01 | 108.33 | 1000.97 511656A75 | 2016-11-01 | 104.17 | 1024.30 511656A75 | 2017-01-01 | 201.67 | 1014.05
問題是第一個月(這裡是 2010 年 6 月 - 請參閱初始表中的第一行值)不在結果集中,因為它沒有過去的銷售,因此 LEFT JOIN 沒有它的行。
預期/通緝:
idclient | month_transac | sales | sales_ttm -----------+---------------+--------+--------- 511656A75 | 2010-06-01 | 68.57 | 0.00 511656A75 | 2010-07-01 | 88.63 | 68.57 511656A75 | 2010-08-01 | 94.91 | 157.20 511656A75 | 2010-09-01 | 70.66 | 252.11 [...]
我可以添加該行的銷售額(用 a
t2.month_transac <= t1.month_transac
然後減去它),但我想我可以找到一種更優雅的方式。我還嘗試使用
LATERAL
join(正如 Erwin 在他的 anwser 中建議的那樣(“執行具有範圍條件的自連接應該更有效,因為 Postgres 9.1 還沒有 LATERAL 連接”),但我猜我沒有掌握它的工作方式,因為我只設法得到錯誤。
- 你確認
WINDOW
功能應該被排除嗎?- 有沒有辦法使用“簡單”LEFY JOIN 來獲取所有行
t1
?- 這種情況可能
LATERAL
有用,如何?- 有哪些優化方法?
使用 PostgreSQL 9.6.2、Windows 10 或 Ubuntu 16.04
績效評估
所以到目前為止,我們有 3 種可能的解決方案;讓我們看看哪個表現更好我檢查了結果表是否相同(它們是)。對 270k 行的表進行測試,知道它是來自 1% 的所有客戶端樣本的結果表
初步方法 -
LEFT JOIN
和GROUP BY
它是問題中建議查詢的更正版本,即在總和中包括目前月份,並從總和中減去月份的值,以便獲得所有行。
SELECT t1.idclient , t1.month_transac , t1.sales , SUM(t2.sales) - t1.sales as sales_ttm FROM temp_sales_sample_month_aggr t1 LEFT JOIN temp_sales_sample_month_aggr t2 USING (idclient) WHERE t2.month_transac >= (t1.month_transac - interval '12 months') AND t2.month_transac <= t1.month_transac GROUP BY 1, 2, 3 ORDER BY 2 ;
查詢性能:
Planning time: 3.615 ms Execution time: 1315.636 ms
@joanolo 方法 - 子查詢
SELECT t1.idclient , t1.month_transac , t1.sales , (SELECT coalesce(SUM(t2.sales), 0) FROM temp_sales_sample_month_aggr t2 WHERE t2.idclient = t1.idclient AND t2.month_transac >= (t1.month_transac - interval '12 months') AND t2.month_transac < t1.month_transac ) AS sales_ttm FROM temp_sales_sample_month_aggr t1 GROUP BY t1.idclient, t1.month_transac, t1.sales ORDER BY t1.month_transac ;
查詢性能:
Planning time: 0.350 ms Execution time: 3163.354 ms
我猜它有更多的行要處理子查詢
LEFT JOIN LATERAL
方法我終於設法讓它工作了。
SELECT t1.idclient , t1.month_transac , t1.sales , COALESCE(lat.sales_ttm, 0.0) FROM temp_sales_sample_month_aggr t1 LEFT JOIN LATERAL ( SELECT SUM(t2.sales) as sales_ttm FROM temp_sales_sample_month_aggr t2 WHERE t1.idclient = t2.idclient AND t2.month_transac >= (t1.month_transac - interval '12 months') AND t2.month_transac < t1.month_transac ) lat ON TRUE ORDER BY 2 ;
查詢性能:
Planning time: 0.468 ms Execution time: 2773.754 ms
所以我想 LATERAL 在這裡沒有幫助,與更簡單的相比
LEFT JOIN
像這樣的東西應該工作..
-- IN A CTE -- Grab the idclient, and the monthly range needed -- We need the range because you can't sum over NULL (yet, afaik). WITH idclient_month AS ( SELECT idclient, month_transac FROM ( SELECT idclient, min(month_transac), max(month_transac) FROM foo GROUP BY idclient ) AS t CROSS JOIN LATERAL generate_series(min::date, max::date, '1 month') AS gs(month_transac) ) -- If we move this where clause down the rows get filtered /before/ the window function SELECT * FROM ( SELECT idclient, month_transac, monthly_sales, sum(monthly_sales) OVER ( PARTITION BY idclient ORDER BY month_transac ROWS 12 PRECEDING ) - monthly_sales AS sales_ttm -- Here, we sum up the sales by idclient, and month -- We coalesce to 0 so we can use this in a window function FROM ( SELECT idclient, month_transac, coalesce(sum(sales), 0) AS monthly_sales FROM foo RIGHT OUTER JOIN idclient_month USING (idclient,month_transac) GROUP BY idclient, month_transac ORDER BY idclient, month_transac ) AS t ) AS g WHERE g.monthly_sales > 0;
在這裡,我們
- 計算 CTE 中 idclient 的日期範圍。
SELECT idclient, month_transac FROM ( SELECT idclient, min(month_transac), max(month_transac) FROM foo GROUP BY idclient ) AS t CROSS JOIN LATERAL generate_series(min::date, max::date, '1 month') AS gs(month_transac) idclient | month_transac -----------+------------------------ 511656A75 | 2010-06-01 00:00:00-05 511656A75 | 2010-07-01 00:00:00-05 511656A75 | 2010-08-01 00:00:00-05 511656A75 | 2010-09-01 00:00:00-05 511656A75 | 2010-10-01 00:00:00-05 511656A75 | 2010-11-01 00:00:00-05 511656A75 | 2010-12-01 00:00:00-06 511656A75 | 2011-01-01 00:00:00-06 [....]
RIGHT OUTER
該 CTE 到我們的樣本數據集。我們這樣做是為了增加我們的樣本數據集,並且我們在需要的地方有monthly_sales = 0 的條目。- 使用使用 windows over 的視窗函式
ROWS 12 PRECEDING
。那是關鍵。那是過去的12個月。視窗函式不能對為空的行進行操作,所以我們在進行這一步之前將它們設置為 0。- 只選擇
monthly_sales > 0
. 我們必須在視窗函式之後執行此操作,以免對可用於計算的內容(視窗)過多。輸出,
idclient | month_transac | monthly_sales | sales_ttm -----------+------------------------+---------------+----------- 511656A75 | 2010-06-01 00:00:00-05 | 68.57 | 0.00 511656A75 | 2010-07-01 00:00:00-05 | 88.63 | 68.57 511656A75 | 2010-08-01 00:00:00-05 | 94.91 | 157.20 511656A75 | 2010-09-01 00:00:00-05 | 70.66 | 252.11 511656A75 | 2010-10-01 00:00:00-05 | 28.84 | 322.77 511656A75 | 2015-10-01 00:00:00-05 | 85.00 | 0.00 511656A75 | 2015-12-01 00:00:00-06 | 114.42 | 85.00 511656A75 | 2016-01-01 00:00:00-06 | 137.08 | 199.42 511656A75 | 2016-03-01 00:00:00-06 | 172.92 | 336.50 511656A75 | 2016-04-01 00:00:00-05 | 125.00 | 509.42 511656A75 | 2016-05-01 00:00:00-05 | 127.08 | 634.42 511656A75 | 2016-06-01 00:00:00-05 | 104.17 | 761.50 511656A75 | 2016-07-01 00:00:00-05 | 98.22 | 865.67 511656A75 | 2016-08-01 00:00:00-05 | 37.08 | 963.89 511656A75 | 2016-10-01 00:00:00-05 | 108.33 | 1000.97 511656A75 | 2016-11-01 00:00:00-05 | 104.17 | 1024.30 511656A75 | 2017-01-01 00:00:00-06 | 201.67 | 1014.05 (17 rows)
您還有另一種選擇,它可能不如
WINDOW
函式最佳,但可以在更多版本的 PostgreSQL 中工作(甚至在其他數據庫中,如MySQL,稍作修改):SELECT t1.idclient , t1.month_transac::date /* to_char(t1.month_transac::date, 'YYYY-MM-DD') */ , t1.sales , (SELECT coalesce(SUM(t2.sales), 0) FROM temp_sales_sample_month_aggr t2 WHERE t2.idclient = t1.idclient AND t2.month_transac >= (t1.month_transac - interval '12 months') AND t2.month_transac < t1.month_transac ) AS sales_ttm FROM temp_sales_sample_month_aggr t1 GROUP BY t1.idclient, t1.month_transac, t1.sales ORDER BY t1.month_transac ;
這就是你得到的:
| idclient | to_char | sales | sales_ttm | |-----------|------------|--------|-----------| | 511656A75 | 2010-06-01 | 68.57 | 0 | | 511656A75 | 2010-07-01 | 88.63 | 68.57 | | 511656A75 | 2010-08-01 | 94.91 | 157.2 | | 511656A75 | 2010-09-01 | 70.66 | 252.11 | | 511656A75 | 2010-10-01 | 28.84 | 322.77 | | 511656A75 | 2015-10-01 | 85 | 0 | | 511656A75 | 2015-12-01 | 114.42 | 85 | | 511656A75 | 2016-01-01 | 137.08 | 199.42 | | 511656A75 | 2016-03-01 | 172.92 | 336.5 | | 511656A75 | 2016-04-01 | 125 | 509.42 | | 511656A75 | 2016-05-01 | 127.08 | 634.42 | | 511656A75 | 2016-06-01 | 104.17 | 761.5 | | 511656A75 | 2016-07-01 | 98.22 | 865.67 | | 511656A75 | 2016-08-01 | 37.08 | 963.89 | | 511656A75 | 2016-10-01 | 108.33 | 1000.97 | | 511656A75 | 2016-11-01 | 104.17 | 1024.3 | | 511656A75 | 2017-01-01 | 201.67 | 1014.05 |
(我使用了您的範例輸入數據……這似乎與後面的範例不一致。)
您可以在SQLFiddle檢查它。