Postgresql
查找按鍵分組的缺失時間戳:查找我的數據中的空白
我有一個表,其中包含時間戳、一些數據和數據源的標識鍵:
create table test_data ( id serial primary key, key text, timestamp timestamp with time zone ); INSERT INTO test_data (key, timestamp) VALUES ('Source_A', '2018-03-15 01:07:06.603029+00'), ('Source_B', '2018-03-15 10:00:01.603029+00'), ('Source_A', '2018-03-15 11:05:06.603029+00'), ('Source_B', '2018-03-15 15:09:06.603029+00'), ('Source_B', '2018-03-15 16:09:06.603029+00');
我想在按每個數據源分組的數據中查找缺失小時數。我有這個適用於單個組的程式碼:
SELECT COUNT(hours)-1 AS missing_hours, 'Source_A' AS key FROM GENERATE_SERIES('2018-03-15', '2018-03-16', INTERVAL '1 hour') AS hours WHERE hours NOT IN ( SELECT TO_TIMESTAMP(FLOOR((EXTRACT('epoch' FROM timestamp) / 3600 )) * 3600) AS time_bit FROM test_data WHERE key = 'Source_A' GROUP BY time_bit)
執行這個給我:
missing_hours, key 22, Source_A
我正在努力弄清楚如何按鍵分組,然後獲取所有數據源的缺失小時數:
missing_hours, key 22, Source_A 21, Source_B
有任何想法嗎?這將在每月分區表上執行,每個表大約有 5000 萬行,所以我不想讓它太貴。單鍵查詢執行大約 2 秒。
一種方法是計算每個鍵的小時數,然後從給定時間段的總小時數中減去。
WITH period as ( SELECT COUNT(*) as total_hours FROM GENERATE_SERIES('2018-03-15', '2018-03-16', INTERVAL '1 hour') gs ), key_counts as ( SELECT key, COUNT(*) as hours FROM ( SELECT distinct key, date_trunc('hour', timestamp) FROM test_data --apply period limit here ) kq GROUP BY KEY ) SELECT key, total_hours-hours as missing_hours FROM period, key_counts