如何加快查詢時間序列中的最後一個值?
我
prices
在 PostgreSQL 10 數據庫中有一個時間序列表。這是一個簡化的測試案例來說明問題:
CREATE TABLE prices ( currency text NOT NULL, side boolean NOT NULL, price numeric NOT NULL, ts timestamptz NOT NULL );
我想快速查詢每個
currency
/side
duo 的最後一個值,因為這會給我每種貨幣的目前買入/賣出價格。我目前的解決方案是:
create index on prices (currency, side, ts desc); select distinct on (currency, side) * order by currency, side, ts desc;
但這會給我這個表中只有約 30k 行的非常慢的查詢(約 500 毫秒)。
實際表有四列我要分組,而不是兩列。下面是實際的表和查詢的樣子:
create table prices ( exchange integer not null, pair text not null, side boolean not null, guaranteed_volume numeric not null, ts timestamp with time zone not null, price numeric not null, constraint prices_pkey primary key (exchange, pair, side, guaranteed_volume, ts), constraint prices_exchange_fkey foreign key (exchange) references exchanges (id) match simple on update no action on delete no action ); create index prices_exchange_pair_side_guaranteed_volume_ts_idx on prices (exchange, pair, side, guaranteed_volume, ts desc); create view last_prices as select distinct on (exchange, pair, side, guaranteed_volume) exchange , pair , side , guaranteed_volume , price , ts from prices order by exchange , pair , side , guaranteed_volume , ts desc;
目前有 34441 行。一些有用的調試查詢:
# explain (analyze,buffers) select * from last_prices; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Unique (cost=2662.03..2997.71 rows=1224 width=37) (actual time=403.218..459.041 rows=392 loops=1) Buffers: shared hit=418 -> Sort (cost=2662.03..2729.17 rows=26854 width=37) (actual time=403.213..411.041 rows=28353 loops=1) Sort Key: prices.exchange, prices.pair, prices.side, prices.guaranteed_volume, prices.ts DESC Sort Method: quicksort Memory: 2984kB Buffers: shared hit=418 -> Seq Scan on prices (cost=0.00..686.54 rows=26854 width=37) (actual time=0.022..31.407 rows=28353 loops=1) Buffers: shared hit=418 Planning time: 0.911 ms Execution time: 460.190 ms
解釋禁用 seqscan 的分析:
# explain (analyze,buffers) select * from last_prices; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Unique (cost=0.41..4458.07 rows=1224 width=37) (actual time=0.037..122.237 rows=392 loops=1) Buffers: shared hit=15182 -> Index Scan using prices_exchange_pair_side_guaranteed_volume_ts_idx on prices (cost=0.41..4189.53 rows=26854 width=37) (actual time=0.034..91.237 rows=29649 loops=1) Buffers: shared hit=15182 Planning time: 0.291 ms Execution time: 122.417 ms
添加一個直接訪問視圖查詢的查詢:
# explain (analyze, buffers) select distinct on (exchange, pair, side, guaranteed_volume) exchange , pair , side , guaranteed_volume , price , ts from prices order by exchange , pair , side , guaranteed_volume , ts desc; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Unique (cost=2163.56..2429.99 rows=1224 width=37) (actual time=364.716..391.405 rows=380 loops=1) Buffers: shared hit=418 -> Sort (cost=2163.56..2216.85 rows=21314 width=37) (actual time=364.711..370.458 rows=24011 loops=1) Sort Key: exchange, pair, side, guaranteed_volume, ts DESC Sort Method: quicksort Memory: 2644kB Buffers: shared hit=418 -> Seq Scan on prices (cost=0.00..631.14 rows=21314 width=37) (actual time=0.025..13.751 rows=24011 loops=1) Buffers: shared hit=418 Planning time: 0.258 ms Execution time: 392.110 ms
我想快速查詢每個
currency
/side
duo的最後一個值
DISTINCT ON
每個感興趣的組合擅長幾行。但是您的案例顯然每個 distinct有很多(currency, side)
行。DISTINCT ON
就性能而言,這是一個糟糕的選擇。您將在關於 SO 的這兩個相關答案中找到詳細的評估和解決方案庫:如果您只需要最新的時間戳
ts
,那麼該列就是排序標準和所需的返回值,情況非常簡單。看看Evan 的簡單解決方案max(ts)
。(好吧,理想情況下,您應該有一個索引
(currency, side, ts desc NULLS LAST)
,因為max(ts)
忽略 NULL 值並更好地匹配此排序順序。但這對於定義的列並不重要NOT NULL
。)通常,您需要每個選定行中的其他列(例如目前價格!)和/或您需要按多列排序,因此您需要做更多。
理想情況下,您有另一個表列出所有貨幣 - 以及一個 FK 約束來強制引用完整性並禁止不存在的貨幣。然後使用連結答案中*“2a. LATERAL join”*一章中的查詢技術,擴展以考慮添加的:
side
根據您最初的簡單測試案例:
SELECT c.currency, s.side, p.* FROM currency c CROSS JOIN (VALUES (true), (false)) s(side) -- account for side CROSS JOIN LATERAL ( SELECT ts, price -- more columns? FROM prices WHERE currency = c.currency AND side = s.side ORDER BY ts DESC -- ts is NOT NULL LIMIT 1 ) p ORDER BY 1, 2; -- optional, whatever you prefer;
您應該會看到對
(currency, side, ts DESC)
.如果僅索引掃描是可能的,並且您只需要
ts
並且price
將price
作為最後一列添加到索引中可能是值得的。dbfiddle在這裡
是否將此查詢保存在 a
VIEW
中都不會影響性能。