Postgresql
在一天中的一個小時內獲得價值增量
我查詢 YouTube Data Api 以獲取頻道上最受歡迎的影片列表,然後每小時 4 次(每 15 分鐘,按 cron)獲取他們的統計數據。數據儲存在 Postgres 中,但將其轉儲並載入到另一個 SQL 數據庫中不會有問題。現在我有以下數據表:
video_id| views_count | likes_count | timestamp ---------+-------------+-------------+--------------------- foo | 100 | 1 | 2018-12-01 12:01:03 foo | 101 | 1 | 2018-12-01 12:16:06 foo | 105 | 1 | 2018-12-01 12:31:01 bar | 199 | 0 | 2018-12-01 12:01:02 bar | 200 | 0 | 2018-12-01 12:16:08 bar | 301 | 5 | 2018-12-01 12:31:02 ... | ...
UPD:這是架構(粘貼到sqlfiddle):
CREATE TABLE video_statistics ( video_id TEXT not null, views_count INTEGER not null, likes_count INTEGER not null, timestamp TIMESTAMPTZ not null );
我應該如何查詢該數據以便按影片分組的小時數
view_counts
和likes_count
列中獲得增量?為了澄清我想要得到的東西:hour_of_day|video_id|views_increment|likes_increment -----------+--------+---------------+--------------- ... | ... 11 | foo | 4 | 0 12 | foo | 5 | 1 ... | ... 11 | bar | 73 | 0 12 | bar | 102 | 5 ... | ...
換句話說,這是基於歷史數據的“發布影片的最佳時間”,同時考慮了數周和數月的數據。我是否應該將數據轉儲到一些時間序列數據庫或其他更適合這種情況的數據庫中,並在那裡查詢?還是我應該只用程式碼計算這個?
一種可能性是首先
row_number()
記錄以獲取每個影片、日期和小時的第一個和最後一個值。然後將兩組 first 和 last 值連接起來,得到各自的差異。將結果按影片和小時分組,並獲得每天每個影片的總和或平均值。SELECT first.video_id, first.timestamp_hour, sum(last.views_count - first.views_count) views_count_diff_sum, sum(last.likes_count - first.likes_count) likes_count_diff_sum, avg(last.views_count - first.views_count) views_count_diff_avg, avg(last.likes_count - first.likes_count) likes_count_diff_avg FROM (SELECT video_id, timestamp_day, timestamp_hour, views_count, likes_count FROM (SELECT video_id, timestamp::date timestamp_day, date_part('hour', timestamp) timestamp_hour, views_count, likes_count, row_number() OVER (PARTITION BY video_id, timestamp::date, date_part('hour', timestamp) ORDER BY timestamp ASC) rn FROM elbat) first WHERE rn = 1) first INNER JOIN (SELECT video_id, timestamp_day, timestamp_hour, views_count, likes_count FROM (SELECT video_id, timestamp::date timestamp_day, date_part('hour', timestamp) timestamp_hour, views_count, likes_count, row_number() OVER (PARTITION BY video_id, timestamp::date, date_part('hour', timestamp) ORDER BY timestamp DESC) rn FROM elbat) last WHERE rn = 1) last ON last.video_id = first.video_id AND last.timestamp_day = first.timestamp_day AND last.timestamp_hour = first.timestamp_hour GROUP BY first.video_id, first.timestamp_hour;