Postgresql

查找按鍵分組的缺失時間戳:查找我的數據中的空白

  • September 18, 2018

我有一個表,其中包含時間戳、一些數據和數據源的標識鍵:

create table test_data (
   id serial primary key,
   key text,
   timestamp timestamp with time zone    
);
INSERT INTO test_data
   (key, timestamp)
VALUES
   ('Source_A', '2018-03-15 01:07:06.603029+00'),
   ('Source_B', '2018-03-15 10:00:01.603029+00'),
   ('Source_A', '2018-03-15 11:05:06.603029+00'),
   ('Source_B', '2018-03-15 15:09:06.603029+00'),
   ('Source_B', '2018-03-15 16:09:06.603029+00');

我想在按每個數據源分組的數據中查找缺失小時數。我有這個適用於單個組的程式碼:

SELECT 
COUNT(hours)-1 AS missing_hours, 
'Source_A' AS key
FROM GENERATE_SERIES('2018-03-15', '2018-03-16', INTERVAL '1 hour') AS hours
 WHERE hours NOT IN 
 ( SELECT TO_TIMESTAMP(FLOOR((EXTRACT('epoch' FROM timestamp) / 3600 )) * 3600) AS time_bit 
  FROM test_data
  WHERE key = 'Source_A'
  GROUP BY time_bit)

執行這個給我:

missing_hours,  key
22,             Source_A

我正在努力弄清楚如何按鍵分組,然後獲取所有數據源的缺失小時數:

missing_hours,  key
22,             Source_A
21,             Source_B 

有任何想法嗎?這將在每月分區表上執行,每個表大約有 5000 萬行,所以我不想讓它太貴。單鍵查詢執行大約 2 秒。

一種方法是計算每個鍵的小時數,然後從給定時間段的總小時數中減去。

WITH period as (
 SELECT COUNT(*) as total_hours 
 FROM GENERATE_SERIES('2018-03-15', '2018-03-16', INTERVAL '1 hour') gs
),
key_counts as (
 SELECT key, COUNT(*) as hours
 FROM (
   SELECT distinct key, date_trunc('hour', timestamp)
   FROM test_data
   --apply period limit here
 ) kq
 GROUP BY KEY
)

SELECT key, total_hours-hours as missing_hours 
FROM 
 period,
 key_counts

小提琴手

引用自:https://dba.stackexchange.com/questions/217865