Postgresql
根據兩個時間戳列按年份分組數據
我的
data
表有以下列:id INTEGER, name TEXT, created TIMESTAMP, deleted TIMESTAMP
我想生成一個
name
每年活躍的每個(可以在表格中出現多次)計數的報告。(此外,如果deleted
時間戳目前仍處於活動狀態,則時間戳可能為空)。到目前為止,我已經設法通過在一長串聯合聲明中手動輸入年份來做到這一點(見下文)。我相信有更好的方法!我還有更多類似的查詢要執行。我試圖創建一個 PL/pgSQL 函式,但無法弄清楚如何將年份作為變數以及如何獲得正確的輸出。我會很高興有一個語句或 PL/pgSQL 函式來實現這一點。
((select '2016' yr, name, count(*) from data where (((deleted - '2016-01-01'::timestamp) > '0 secs') or (deleted is null)) and (created - '2016-01-01'::timestamp) <= '0 secs' group by name order by count desc) union all ((select '2015' yr, name, count(*) from data where (((deleted - '2015-01-01'::timestamp) > '0 secs') or (deleted is null)) and (created - '2015-01-01'::timestamp) <= '0 secs' group by name order by count desc) etc..
我得到了多年使用:
select distinct date_part('year',created) from data order by date_part('year',created);
然後在很長的聯合語句中手動輸入它們。(在我的情況下是 2007-2016 年!)
在
generate_series()
連接LATERAL
(Postgres 9.3+)和date_trunc()
中,這可以很簡短:SELECT EXTRACT(YEAR FROM yr)::text AS year, name, count(*) AS ct FROM data , generate_series(date_trunc('year', created) -- LATERAL join , COALESCE(deleted, localtimestamp) , interval '1 year') yr GROUP BY yr, name ORDER BY yr, ct DESC;
就這樣。返回最早年份和目前年份之間所有年份的結果。 訣竅是在聚合之前為基行重疊的每一年生成一行。
created
選擇:
SELECT EXTRACT(YEAR FROM yr) AS year, d.name, count(*) AS count FROM generate_series ((SELECT date_trunc('year', min(created)) FROM data) , localtimestamp, interval '1 year') yr JOIN data d ON d.created < yr::timestamp + interval '1 year' AND (d.deleted > yr::timestamp OR d.deleted IS NULL) GROUP BY yr, d.name ORDER BY count(*) DESC;
這會產生加入前的整個年份範圍。計算手動選擇年份的數字可能更方便。
有關的:
如果您需要從更大的表中優化一小
tsrange
部分年份的性能,則可以選擇針對類型的 GiST 索引: