Postgresql

在 PostgreSQL 上計算連續 7 天的滾動總和

  • March 2, 2016

我需要為每行(每天 1 行)獲取 7 天的滾動總和。

例如:

| Date       | Count | 7-Day Rolling Sum |
------------------------------------------
| 2016-02-01 | 1     | 1
| 2016-02-02 | 1     | 2
| 2016-02-03 | 2     | 4
| 2016-02-04 | 2     | 6
| 2016-02-05 | 2     | 8
| 2016-02-06 | 2     | 10
| 2016-02-07 | 2     | 12
| 2016-02-08 | 2     | 13 --> here we start summing from 02-02
| 2016-02-09 | 2     | 14 --> here we start summing from 02-03
| 2016-02-10 | 5     | 17 --> here we start summing from 02-04

我需要在一個查詢中返回具有 7 天滾動總和的行以及總和範圍的最後一天的日期。例如,day=2016-02-10,總和 17。

到目前為止,我有這個,但它沒有完全工作:

DO
$do$
DECLARE 
   curr_date date;
   num bigint;
BEGIN
FOR curr_date IN (SELECT date_trunc('day', d)::date FROM generate_series(CURRENT_DATE-31, CURRENT_DATE-1, '1 day'::interval) d)
LOOP 
   SELECT curr_date, SUM(count)
   FROM generate_series (curr_date-8, curr_date-1, '1 day'::interval) d
   LEFT JOIN m.ping AS p ON p.date = d
   LEFT JOIN m.ping_type AS pt ON pt.id = p.ping_type_id
   LEFT JOIN m.ping_frequency AS pf ON pf.id = p.ping_frequency_id
   WHERE
       pt.url_slug = 'active' AND
       pf.url_slug = 'weekly';
END LOOP;
END
$do$;

我正在使用 PostgreSQL 9.4.5。可能有多行具有相同的日期。如果有差距(缺一天),仍會遵循連續 7 天的區間。

到目前為止,最乾淨的解決方案是使用視窗sum函式rows between

with days as (
       SELECT date_trunc('day', d)::date as day
       FROM generate_series(CURRENT_DATE-31, CURRENT_DATE-1, '1 day'::interval) d ),
   counts as (
       select 
           days.day,
           sum((random()*5)::integer) num
       FROM days
       -- left join other tables here to get counts, I'm using random
       group by days.day
   )
select
   day,
   num,
   sum(num) over (order by day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
from counts
order by day;

重要的部分是在daysCTE 中生成時間框架並加入它,以免錯過任何沒有數據的日子。

例子

例如,如果我在過去 14 天內創建了一些包含 20 條記錄的測試數據:

SELECT (current_date - ((random()*14)::integer::text || 'days')::interval)::date as day, (random()*7)::integer as num
into test_data from generate_series(1, 20);;

並在此之前添加一個值:

insert into test_data values ((current_date - '25 days'::interval), 5);

然後使用上面的查詢:

with days as (
       SELECT date_trunc('day', d)::date as day
       FROM generate_series(CURRENT_DATE-31, CURRENT_DATE-1, '1 day'::interval) d ),
   counts as (
       select 
           days.day,
           sum(t.num) num
       FROM days
       left join test_data t on t.day = days.day
       group by days.day
   )
select
   day,
   num,
   sum(num) over (order by day rows between 6 preceding and current row)
from counts
order by day;

並獲得整個月的結果:

   day     | num | sum 
------------+-----+-----
2016-01-31 |     |    
2016-02-01 |     |    
2016-02-02 |     |    
2016-02-03 |     |    
2016-02-04 |     |    
2016-02-05 |     |    
2016-02-06 |   5 |   5
2016-02-07 |     |   5
2016-02-08 |     |   5
2016-02-09 |     |   5
2016-02-10 |     |   5
2016-02-11 |     |   5
2016-02-12 |     |   5
2016-02-13 |     |    
2016-02-14 |     |    
2016-02-15 |     |    
2016-02-16 |     |    
2016-02-17 |     |    
2016-02-18 |   2 |   2
2016-02-19 |   5 |   7
2016-02-20 |     |   7
2016-02-21 |   4 |  11
2016-02-22 |  15 |  26
2016-02-23 |   1 |  27
2016-02-24 |   1 |  28
2016-02-25 |   2 |  28
2016-02-26 |   4 |  27
2016-02-27 |   9 |  36
2016-02-28 |   5 |  37
2016-02-29 |  11 |  33
2016-03-01 |   5 |  37
(31 rows)

引用自:https://dba.stackexchange.com/questions/130949