Postgresql

在一天中的一個小時內獲得價值增量

  • January 21, 2019

我查詢 YouTube Data Api 以獲取頻道上最受歡迎的影片列表,然後每小時 4 次(每 15 分鐘,按 cron)獲取他們的統計數據。數據儲存在 Postgres 中,但將其轉儲並載入到另一個 SQL 數據庫中不會有問題。現在我有以下數據表:

video_id| views_count | likes_count | timestamp 
---------+-------------+-------------+---------------------
    foo | 100         | 1           | 2018-12-01 12:01:03
    foo | 101         | 1           | 2018-12-01 12:16:06
    foo | 105         | 1           | 2018-12-01 12:31:01
    bar | 199         | 0           | 2018-12-01 12:01:02
    bar | 200         | 0           | 2018-12-01 12:16:08
    bar | 301         | 5           | 2018-12-01 12:31:02
    ... | ...

UPD:這是架構(粘貼到sqlfiddle):

CREATE TABLE video_statistics
(
 video_id TEXT not null,
 views_count INTEGER not null,
 likes_count INTEGER not null,
 timestamp TIMESTAMPTZ not null
);

我應該如何查詢該數據以便按影片分組的小時數view_countslikes_count列中獲得增量?為了澄清我想要得到的東西:

hour_of_day|video_id|views_increment|likes_increment
-----------+--------+---------------+---------------
    ...   | ...
    11    | foo    | 4             | 0
    12    | foo    | 5             | 1
    ...   | ...
    11    | bar    | 73            | 0
    12    | bar    | 102           | 5
    ...   | ...

換句話說,這是基於歷史數據的“發布影片的最佳時間”,同時考慮了數周和數月的數據。我是否應該將數據轉儲到一些時間序列數據庫或其他更適合這種情況的數據庫中,並在那裡查詢?還是我應該只用程式碼計算這個?

一種可能性是首先row_number()記錄以獲取每個影片、日期和小時的第一個和最後一個值。然後將兩組 first 和 last 值連接起來,得到各自的差異。將結果按影片和小時分組,並獲得每天每個影片的總和或平均值。

SELECT first.video_id,
      first.timestamp_hour,
      sum(last.views_count - first.views_count) views_count_diff_sum,
      sum(last.likes_count - first.likes_count) likes_count_diff_sum,
      avg(last.views_count - first.views_count) views_count_diff_avg,
      avg(last.likes_count - first.likes_count) likes_count_diff_avg
      FROM (SELECT video_id,
            timestamp_day,
            timestamp_hour,
            views_count,
            likes_count
            FROM (SELECT video_id,
                         timestamp::date timestamp_day,
                         date_part('hour', timestamp) timestamp_hour,
                         views_count,
                         likes_count,
                         row_number() OVER (PARTITION BY video_id,
                                                         timestamp::date,
                                                         date_part('hour', timestamp)
                                            ORDER BY timestamp ASC) rn
                         FROM elbat) first
            WHERE rn = 1) first
           INNER JOIN (SELECT video_id,
                              timestamp_day,
                              timestamp_hour,
                              views_count,
                              likes_count
                              FROM (SELECT video_id,
                                           timestamp::date timestamp_day,
                                           date_part('hour', timestamp) timestamp_hour,
                                           views_count,
                                           likes_count,
                                           row_number() OVER (PARTITION BY video_id,
                                                                           timestamp::date,
                                                                           date_part('hour', timestamp)
                                                              ORDER BY timestamp DESC) rn
                                           FROM elbat) last
                              WHERE rn = 1) last
                      ON last.video_id = first.video_id
                         AND last.timestamp_day = first.timestamp_day
                         AND last.timestamp_hour = first.timestamp_hour
      GROUP BY first.video_id,
               first.timestamp_hour;

引用自:https://dba.stackexchange.com/questions/227515