Postgresql
不同維度的嵌套聚合函式
我有一張表,我定期記錄一組伺服器上所有網路介面的吞吐量:
create table net ( ts timestamptz not null, host text not null, interface text not null, recv_bytes bigint not null ); create index on net (ts); insert into net (ts, host, interface, recv_bytes) values ('2017-01-01 00:00:00+00', 'a', 'eth0', 500), ('2017-01-01 00:00:00+00', 'b', 'eth1', 2000), ('2017-01-01 00:00:01+00', 'b', 'eth0', 1000), -- measurements arrive with some jitter, +1s here ('2017-01-01 00:00:02+00', 'c', 'eth0', 100), -- only present in this interval ('2017-01-01 00:00:04+00', 'b', 'eth1', 1900), ('2017-01-01 00:00:05+00', 'a', 'eth0', 550), ('2017-01-01 00:00:05+00', 'b', 'eth0', 1200), ('2017-01-01 00:00:10+00', 'a', 'eth0', 600), ('2017-01-01 00:00:10+00', 'b', 'eth0', 1500), ('2017-01-01 00:00:11+00', 'b', 'eth1', 1900), ('2017-01-01 00:00:15+00', 'a', 'eth0', 600), ('2017-01-01 00:00:15+00', 'b', 'eth1', 1400), ('2017-01-01 00:00:16+00', 'b', 'eth0', 1400), ('2017-01-01 00:00:16+00', 'b', 'eth1', 1700); -- (b,eth1) appears 3 times in this interval
我想找到給定時間範圍內所有介面的總吞吐量,平均超過 10 秒的時間間隔。如果(主機,介面)對在給定的時間間隔內缺少數據,則可以簡單地省略它。這就是我想出的:
select ts_interval, host, sum(recv_bytes) as recv_bytes from ( select to_timestamp(floor(extract(epoch from ts) / 10) * 10) as ts_interval, host, interface, avg(recv_bytes) as recv_bytes from net group by ts_interval, host, interface ) avg_net where ts_interval >= '2017-01-01 00:00:00+00' and ts_interval < '2017-01-01 00:00:20+00' group by ts_interval, host order by ts_interval, host;
結果:
ts_interval | host | recv_bytes ------------------------+------+----------------------- 2017-01-01 00:00:00+00 | a | 525.0000000000000000 2017-01-01 00:00:00+00 | b | 3050.0000000000000000 2017-01-01 00:00:00+00 | c | 100.0000000000000000 2017-01-01 00:00:10+00 | a | 600.0000000000000000 2017-01-01 00:00:10+00 | b | 3116.6666666666666667
此查詢似乎過於冗長。我試圖做的本質是在一個維度上求和,在另一個維度上求平均值。在偽 SQL 中:
select to_timestamp(floor(extract(epoch from ts) / 10) * 10) as ts_interval, host, sum(avg(recv_bytes OVER ts_interval) OVER host) as recv_bytes from net where ts >= '2017-01-01 00:00:00+00' and ts < '2017-01-01 00:00:20+00' group by ts_interval, host order by ts_interval, host;
有沒有更簡潔的方法來做到這一點?
詢問
您可以使用
generate_series()
生成時間柵格來加入:SELECT g AS ts_interval, host, sum(recv_bytes) AS recv_bytes FROM ( SELECT g, host, interface, avg(recv_bytes) AS recv_bytes FROM generate_series(timestamptz '2017-01-01 00:00:00+00' , timestamptz '2017-01-01 00:00:10+00' -- 10 - only lower bound , interval '10 sec') g JOIN net n ON ts >= g AND ts < g + interval '10 sec' GROUP BY g, host, interface ) sub GROUP BY g, host ORDER BY g, host;
結果相同。不那麼冗長,但如果你在
(ts)
.再說一次,關鍵是要有“ sargable ”謂詞,沒有
generate_series()
這種方式你也可以得到:SELECT ts_interval, host, sum(recv_bytes) as recv_bytes FROM ( SELECT to_timestamp(trunc(extract(epoch from ts) / 10) * 10) AS ts_interval , host, interface, avg(recv_bytes) as recv_bytes FROM net WHERE ts >= '2017-01-01 00:00:00+00' AND ts < '2017-01-01 00:00:20+00' -- just make sure to match bounds GROUP BY 1, 2, 3 ) avg_net GROUP BY 1, 2 ORDER BY 1, 2;
次要點:僅使用正數,
trunc()
相當於floor()
並且更快一點。我使用的是序數,因為您要求“不那麼冗長”的程式碼。但這不是這裡問題的核心……
核心問題
您可以在一個查詢級別在聚合函式上執行視窗函式(儘管它通常不比使用子查詢快)。
但反過來不行:您不能在一個查詢級別中在視窗函式上執行聚合函式。只是不可能。視窗函式發生在 SQL 中的聚合函式之後。
有關的: