Postgresql

目前行日期過去 12 個月的總銷售額

  • February 22, 2017

我需要根據給定月份的每一行計算給定 client_id 過去 12 個月的銷售額總和。

這是按客戶逐月匯總銷售額的初始表(在此處針對特定客戶進行過濾511656A75):

CREATE TEMP TABLE foo AS
SELECT idclient, month_transac, sales
FROM ( VALUES
 ( '511656A75', '2010-06-01',  68.57 ),
 ( '511656A75', '2010-07-01',  88.63 ),
 ( '511656A75', '2010-08-01',  94.91 ),
 ( '511656A75', '2010-09-01',  70.66 ),
 ( '511656A75', '2010-10-01',  28.84 ),
 ( '511656A75', '2015-10-01',  85.00 ),
 ( '511656A75', '2015-12-01', 114.42 ),
 ( '511656A75', '2016-01-01', 137.08 ),
 ( '511656A75', '2016-03-01', 172.92 ),
 ( '511656A75', '2016-04-01', 125.00 ),
 ( '511656A75', '2016-05-01', 127.08 ),
 ( '511656A75', '2016-06-01', 104.17 ),
 ( '511656A75', '2016-07-01',  98.22 ),
 ( '511656A75', '2016-08-01',  37.08 ),
 ( '511656A75', '2016-10-01', 108.33 ),
 ( '511656A75', '2016-11-01', 104.17 ),
 ( '511656A75', '2017-01-01', 201.67 )
) AS t(idclient, month_transac, sales);

請注意,有些月份沒有任何銷售(沒有行),所以我想我不能使用WINDOW函式(例如前面的 12 行)。

使用這個很好的答案來解決類似的問題(Rolling sum / count / average over date interval)我已經完成了這個查詢:

SELECT t1.idclient
   , t1.month_transac
   , t1.sales
   , SUM(t2.sales) as sales_ttm 
FROM temp_sales_sample_month_aggr t1
LEFT JOIN  temp_sales_sample_month_aggr t2 USING (idclient)
   WHERE 
       t1.idclient = '511656A75' -- for example only
       AND t2.month_transac >= (t1.month_transac - interval '12 months') 
       AND t2.month_transac < t1.month_transac 
GROUP BY 1, 2, 3
ORDER BY 2
;

結果還可以:sales_ttm是過去 12 個月的銷售額總和,不包括該行月份的銷售額(即 2017 年 1 月的最後一行總和 2016 年的所有銷售額)。

idclient  | month_transac | sales  | sales_ttm
-----------+---------------+--------+---------
511656A75 | 2010-07-01    |  88.63 |   68.57
511656A75 | 2010-08-01    |  94.91 |  157.20
[...]
511656A75 | 2015-12-01    | 114.42 |  824.83
511656A75 | 2016-01-01    | 137.08 |  892.17
511656A75 | 2016-03-01    | 172.92 |  752.75
511656A75 | 2016-04-01    | 125.00 |  925.67
511656A75 | 2016-05-01    | 127.08 | 1028.17
511656A75 | 2016-06-01    | 104.17 | 1155.25
511656A75 | 2016-07-01    |  98.22 | 1073.59
511656A75 | 2016-08-01    |  37.08 | 1171.81
511656A75 | 2016-10-01    | 108.33 | 1000.97
511656A75 | 2016-11-01    | 104.17 | 1024.30
511656A75 | 2017-01-01    | 201.67 | 1014.05

問題是第一個月(這裡是 2010 年 6 月 - 請參閱初始表中的第一行值)不在結果集中,因為它沒有過去的銷售,因此 LEFT JOIN 沒有它的行。

預期/通緝:

idclient  | month_transac | sales  | sales_ttm
-----------+---------------+--------+---------
511656A75 | 2010-06-01    |  68.57 |    0.00
511656A75 | 2010-07-01    |  88.63 |   68.57
511656A75 | 2010-08-01    |  94.91 |  157.20
511656A75 | 2010-09-01    |  70.66 |  252.11
[...]

我可以添加該行的銷售額(用 at2.month_transac <= t1.month_transac然後減去它),但我想我可以找到一種更優雅的方式。

我還嘗試使用LATERALjoin(正如 Erwin 在他的 anwser 中建議的那樣(“執行具有範圍條件的自連接應該更有效,因為 Postgres 9.1 還沒有 LATERAL 連接”),但我猜我沒有掌握它的工作方式,因為我只設法得到錯誤。

  • 你確認WINDOW功能應該被排除嗎?
  • 有沒有辦法使用“簡單”LEFY JOIN 來獲取所有行t1
  • 這種情況可能LATERAL有用,如何?
  • 有哪些優化方法?

使用 PostgreSQL 9.6.2、Windows 10 或 Ubuntu 16.04


績效評估

所以到目前為止,我們有 3 種可能的解決方案;讓我們看看哪個表現更好我檢查了結果表是否相同(它們是)。對 270k 行的表進行測試,知道它是來自 1% 的所有客戶端樣本的結果表

初步方法 -LEFT JOINGROUP BY

它是問題中建議查詢的更正版本,即在總和中包括目前月份,並從總和中減去月份的值,以便獲得所有行。

SELECT t1.idclient
   , t1.month_transac
   , t1.sales
   , SUM(t2.sales) - t1.sales as sales_ttm 
FROM temp_sales_sample_month_aggr t1
LEFT JOIN  temp_sales_sample_month_aggr t2 USING (idclient)
   WHERE 
       t2.month_transac >= (t1.month_transac - interval '12 months') AND
       t2.month_transac <= t1.month_transac 
GROUP BY 1, 2, 3
ORDER BY 2
;

查詢性能:

Planning time:     3.615 ms
Execution time: 1315.636 ms

@joanolo 方法 - 子查詢

SELECT 
     t1.idclient
   , t1.month_transac
   , t1.sales
   , (SELECT 
           coalesce(SUM(t2.sales), 0) 
      FROM 
           temp_sales_sample_month_aggr t2
      WHERE 
           t2.idclient = t1.idclient 
           AND t2.month_transac >= (t1.month_transac - interval '12 months') 
           AND t2.month_transac < t1.month_transac
     ) AS sales_ttm 
FROM 
   temp_sales_sample_month_aggr t1
GROUP BY 
   t1.idclient, t1.month_transac, t1.sales
ORDER BY 
   t1.month_transac ;

查詢性能:

Planning time:     0.350 ms
Execution time: 3163.354 ms

我猜它有更多的行要處理子查詢

LEFT JOIN LATERAL方法

我終於設法讓它工作了。

SELECT t1.idclient
   , t1.month_transac
   , t1.sales
   , COALESCE(lat.sales_ttm, 0.0)
FROM temp_sales_sample_month_aggr t1
LEFT JOIN LATERAL (
   SELECT SUM(t2.sales) as sales_ttm
   FROM temp_sales_sample_month_aggr t2
   WHERE 
       t1.idclient = t2.idclient AND
       t2.month_transac >= (t1.month_transac - interval '12 months') AND
       t2.month_transac < t1.month_transac 
) lat ON TRUE
ORDER BY 2
;

查詢性能:

Planning time:     0.468 ms
Execution time: 2773.754 ms

所以我想 LATERAL 在這裡沒有幫助,與更簡單的相比LEFT JOIN

像這樣的東西應該​​工作..

-- IN A CTE
-- Grab the idclient, and the monthly range needed
-- We need the range because you can't sum over NULL (yet, afaik).
WITH idclient_month AS (
 SELECT idclient, month_transac
 FROM (
   SELECT idclient, min(month_transac), max(month_transac)
   FROM foo
   GROUP BY idclient
 ) AS t
 CROSS JOIN LATERAL generate_series(min::date, max::date, '1 month')
   AS gs(month_transac)
)
-- If we move this where clause down the rows get filtered /before/ the window function
SELECT *
FROM (

 SELECT
   idclient,
   month_transac,
   monthly_sales,
   sum(monthly_sales) OVER (
     PARTITION BY idclient
     ORDER BY month_transac
     ROWS 12 PRECEDING
   )
     - monthly_sales
     AS sales_ttm

 -- Here, we sum up the sales by idclient, and month
 -- We coalesce to 0 so we can use this in a window function
 FROM (
   SELECT idclient, month_transac, coalesce(sum(sales), 0) AS monthly_sales
   FROM foo
   RIGHT OUTER JOIN idclient_month
     USING (idclient,month_transac)
   GROUP BY idclient, month_transac
   ORDER BY idclient, month_transac
 ) AS t

) AS g
WHERE g.monthly_sales > 0;

在這裡,我們

  1. 計算 CTE 中 idclient 的日期範圍。
SELECT idclient, month_transac
FROM (
 SELECT idclient, min(month_transac), max(month_transac)
 FROM foo
 GROUP BY idclient
) AS t
CROSS JOIN LATERAL generate_series(min::date, max::date, '1 month')
 AS gs(month_transac)
idclient  |     month_transac      
-----------+------------------------
511656A75 | 2010-06-01 00:00:00-05
511656A75 | 2010-07-01 00:00:00-05
511656A75 | 2010-08-01 00:00:00-05
511656A75 | 2010-09-01 00:00:00-05
511656A75 | 2010-10-01 00:00:00-05
511656A75 | 2010-11-01 00:00:00-05
511656A75 | 2010-12-01 00:00:00-06
511656A75 | 2011-01-01 00:00:00-06
[....]
  1. RIGHT OUTER該 CTE 到我們的樣本數據集。我們這樣做是為了增加我們的樣本數據集,並且我們在需要的地方有monthly_sales = 0 的條目。
  2. 使用使用 windows over 的視窗函式ROWS 12 PRECEDING。那是關鍵。那是過去的12個月。視窗函式不能對為空的行進行操作,所以我們在進行這一步之前將它們設置為 0。
  3. 只選擇monthly_sales > 0. 我們必須在視窗函式之後執行此操作,以免對可用於計算的內容(視窗)過多。

輸出,

idclient  |     month_transac      | monthly_sales | sales_ttm 
-----------+------------------------+---------------+-----------
511656A75 | 2010-06-01 00:00:00-05 |         68.57 |      0.00
511656A75 | 2010-07-01 00:00:00-05 |         88.63 |     68.57
511656A75 | 2010-08-01 00:00:00-05 |         94.91 |    157.20
511656A75 | 2010-09-01 00:00:00-05 |         70.66 |    252.11
511656A75 | 2010-10-01 00:00:00-05 |         28.84 |    322.77
511656A75 | 2015-10-01 00:00:00-05 |         85.00 |      0.00
511656A75 | 2015-12-01 00:00:00-06 |        114.42 |     85.00
511656A75 | 2016-01-01 00:00:00-06 |        137.08 |    199.42
511656A75 | 2016-03-01 00:00:00-06 |        172.92 |    336.50
511656A75 | 2016-04-01 00:00:00-05 |        125.00 |    509.42
511656A75 | 2016-05-01 00:00:00-05 |        127.08 |    634.42
511656A75 | 2016-06-01 00:00:00-05 |        104.17 |    761.50
511656A75 | 2016-07-01 00:00:00-05 |         98.22 |    865.67
511656A75 | 2016-08-01 00:00:00-05 |         37.08 |    963.89
511656A75 | 2016-10-01 00:00:00-05 |        108.33 |   1000.97
511656A75 | 2016-11-01 00:00:00-05 |        104.17 |   1024.30
511656A75 | 2017-01-01 00:00:00-06 |        201.67 |   1014.05
(17 rows)

您還有另一種選擇,它可能不如WINDOW函式最佳,但可以在更多版本的 PostgreSQL 中工作(甚至在其他數據庫中,如MySQL,稍作修改):

SELECT 
     t1.idclient
   , t1.month_transac::date   /* to_char(t1.month_transac::date, 'YYYY-MM-DD') */
   , t1.sales
   , (SELECT 
             coalesce(SUM(t2.sales), 0) 
      FROM 
             temp_sales_sample_month_aggr t2
      WHERE 
             t2.idclient = t1.idclient 
               AND t2.month_transac >= (t1.month_transac - interval '12 months') 
               AND t2.month_transac < t1.month_transac
     ) AS sales_ttm 
FROM 
   temp_sales_sample_month_aggr t1
GROUP BY 
   t1.idclient, t1.month_transac, t1.sales
ORDER BY 
   t1.month_transac ;

這就是你得到的:

|  idclient |    to_char |  sales | sales_ttm |
|-----------|------------|--------|-----------|
| 511656A75 | 2010-06-01 |  68.57 |         0 |
| 511656A75 | 2010-07-01 |  88.63 |     68.57 |
| 511656A75 | 2010-08-01 |  94.91 |     157.2 |
| 511656A75 | 2010-09-01 |  70.66 |    252.11 |
| 511656A75 | 2010-10-01 |  28.84 |    322.77 |
| 511656A75 | 2015-10-01 |     85 |         0 |
| 511656A75 | 2015-12-01 | 114.42 |        85 |
| 511656A75 | 2016-01-01 | 137.08 |    199.42 |
| 511656A75 | 2016-03-01 | 172.92 |     336.5 |
| 511656A75 | 2016-04-01 |    125 |    509.42 |
| 511656A75 | 2016-05-01 | 127.08 |    634.42 |
| 511656A75 | 2016-06-01 | 104.17 |     761.5 |
| 511656A75 | 2016-07-01 |  98.22 |    865.67 |
| 511656A75 | 2016-08-01 |  37.08 |    963.89 |
| 511656A75 | 2016-10-01 | 108.33 |   1000.97 |
| 511656A75 | 2016-11-01 | 104.17 |    1024.3 |
| 511656A75 | 2017-01-01 | 201.67 |   1014.05 |

(我使用了您的範例輸入數據……這似乎與後面的範例不一致。)

您可以在SQLFiddle檢查它。

引用自:https://dba.stackexchange.com/questions/164838