Postgresql

如何加快查詢時間序列中的最後一個值?

  • February 28, 2021

prices在 PostgreSQL 10 數據庫中有一個時間序列表。

這是一個簡化的測試案例來說明問題:

CREATE TABLE prices (
   currency text NOT NULL,
   side     boolean NOT NULL,
   price    numeric NOT NULL,
   ts       timestamptz NOT NULL
);

我想快速查詢每個currency/ sideduo 的最後一個值,因為這會給我每種貨幣的目前買入/賣出價格。

我目前的解決方案是:

create index on prices (currency, side, ts desc);

select distinct on (currency, side) *
order by currency, side, ts desc;

但這會給我這個表中只有約 30k 行的非常慢的查詢(約 500 毫秒)。

實際表有四列我要分組,而不是兩列。下面是實際的表和查詢的樣子:

create table prices (
   exchange integer not null,
   pair text not null,
   side boolean not null,
   guaranteed_volume numeric not null,
   ts timestamp with time zone not null,
   price numeric not null,
   constraint prices_pkey primary key (exchange, pair, side, guaranteed_volume, ts),
   constraint prices_exchange_fkey foreign key (exchange)
       references exchanges (id) match simple
       on update no action
       on delete no action
);

create index prices_exchange_pair_side_guaranteed_volume_ts_idx
     on prices (exchange, pair, side, guaranteed_volume, ts desc);

create view last_prices as
select distinct on (exchange, pair, side, guaranteed_volume)
      exchange
    , pair
    , side
    , guaranteed_volume
    , price
    , ts
 from prices
order by exchange
       , pair
       , side
       , guaranteed_volume
       , ts desc;

目前有 34441 行。一些有用的調試查詢:

# explain (analyze,buffers) select * from last_prices;
                                                      QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
Unique  (cost=2662.03..2997.71 rows=1224 width=37) (actual time=403.218..459.041 rows=392 loops=1)
  Buffers: shared hit=418
  ->  Sort  (cost=2662.03..2729.17 rows=26854 width=37) (actual time=403.213..411.041 rows=28353 loops=1)
        Sort Key: prices.exchange, prices.pair, prices.side, prices.guaranteed_volume, prices.ts DESC
        Sort Method: quicksort  Memory: 2984kB
        Buffers: shared hit=418
        ->  Seq Scan on prices  (cost=0.00..686.54 rows=26854 width=37) (actual time=0.022..31.407 rows=28353 loops=1)
              Buffers: shared hit=418
Planning time: 0.911 ms
Execution time: 460.190 ms

解釋禁用 seqscan 的分析:

# explain (analyze,buffers) select * from last_prices;
                                                                                 QUERY PLAN                                                                                  
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique  (cost=0.41..4458.07 rows=1224 width=37) (actual time=0.037..122.237 rows=392 loops=1)
  Buffers: shared hit=15182
  ->  Index Scan using prices_exchange_pair_side_guaranteed_volume_ts_idx on prices  (cost=0.41..4189.53 rows=26854 width=37) (actual time=0.034..91.237 rows=29649 loops=1)
        Buffers: shared hit=15182
Planning time: 0.291 ms
Execution time: 122.417 ms

添加一個直接訪問視圖查詢的查詢:

# explain (analyze, buffers)
select distinct on (exchange, pair, side, guaranteed_volume)
      exchange
    , pair
    , side
    , guaranteed_volume
    , price
    , ts
 from prices
order by exchange
       , pair
       , side
       , guaranteed_volume
       , ts desc;
                                                      QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
Unique  (cost=2163.56..2429.99 rows=1224 width=37) (actual time=364.716..391.405 rows=380 loops=1)
  Buffers: shared hit=418
  ->  Sort  (cost=2163.56..2216.85 rows=21314 width=37) (actual time=364.711..370.458 rows=24011 loops=1)
        Sort Key: exchange, pair, side, guaranteed_volume, ts DESC
        Sort Method: quicksort  Memory: 2644kB
        Buffers: shared hit=418
        ->  Seq Scan on prices  (cost=0.00..631.14 rows=21314 width=37) (actual time=0.025..13.751 rows=24011 loops=1)
              Buffers: shared hit=418
Planning time: 0.258 ms
Execution time: 392.110 ms

我想快速查詢每個currency/ sideduo的最後一個值

DISTINCT ON每個感興趣的組合擅長行。但是您的案例顯然每個 distinct有很多(currency, side)行。DISTINCT ON就性能而言,這是一個糟糕的選擇。您將在關於 SO 的這兩個相關答案中找到詳細的評估和解決方案庫:

如果您只需要最新的時間戳ts,那麼該列就是排序標準和所需的返回值,情況非常簡單。看看Evan 的簡單解決方案max(ts)

(好吧,理想情況下,您應該有一個索引(currency, side, ts desc NULLS LAST),因為max(ts)忽略 NULL 值並更好地匹配此排序順序。但這對於定義的列並不重要NOT NULL。)

通常,您需要每個選定行中的其他列(例如目前價格!)和/或您需要按多列排序,因此您需要做更多。

理想情況下,您有另一個表列出所有貨幣 - 以及一個 FK 約束來強制引用完整性並禁止不存在的貨幣。然後使用連結答案中*“2a. LATERAL join”*一章中的查詢技術,擴展以考慮添加的:side

根據您最初的簡單測試案例:

SELECT c.currency, s.side, p.*
FROM   currency c
CROSS  JOIN (VALUES (true), (false)) s(side)  -- account for side
CROSS  JOIN LATERAL (
  SELECT ts, price              -- more columns?
  FROM   prices
  WHERE  currency = c.currency
  AND    side = s.side
  ORDER  BY ts DESC             -- ts is NOT NULL
  LIMIT  1
  ) p
ORDER  BY 1, 2;  -- optional, whatever you prefer;

您應該會看到(currency, side, ts DESC).

如果僅索引掃描是可能的,並且您只需要ts並且priceprice作為最後一列添加到索引中可能是值得的。

dbfiddle在這裡

是否將此查詢保存在 aVIEW中都不會影響性能。

引用自:https://dba.stackexchange.com/questions/202248