如何加快查詢時間序列中的最後一個值？

February 28, 2021

我prices在 PostgreSQL 10 數據庫中有一個時間序列表。

這是一個簡化的測試案例來說明問題：

CREATE TABLE prices (
   currency text NOT NULL,
   side     boolean NOT NULL,
   price    numeric NOT NULL,
   ts       timestamptz NOT NULL
);

我想快速查詢每個currency/ sideduo 的最後一個值，因為這會給我每種貨幣的目前買入/賣出價格。

我目前的解決方案是：

create index on prices (currency, side, ts desc);

select distinct on (currency, side) *
order by currency, side, ts desc;

但這會給我這個表中只有約 30k 行的非常慢的查詢（約 500 毫秒）。

實際表有四列我要分組，而不是兩列。下面是實際的表和查詢的樣子：

create table prices (
   exchange integer not null,
   pair text not null,
   side boolean not null,
   guaranteed_volume numeric not null,
   ts timestamp with time zone not null,
   price numeric not null,
   constraint prices_pkey primary key (exchange, pair, side, guaranteed_volume, ts),
   constraint prices_exchange_fkey foreign key (exchange)
       references exchanges (id) match simple
       on update no action
       on delete no action
);

create index prices_exchange_pair_side_guaranteed_volume_ts_idx
     on prices (exchange, pair, side, guaranteed_volume, ts desc);

create view last_prices as
select distinct on (exchange, pair, side, guaranteed_volume)
      exchange
    , pair
    , side
    , guaranteed_volume
    , price
    , ts
 from prices
order by exchange
       , pair
       , side
       , guaranteed_volume
       , ts desc;

目前有 34441 行。一些有用的調試查詢：

# explain (analyze,buffers) select * from last_prices;
                                                      QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
Unique  (cost=2662.03..2997.71 rows=1224 width=37) (actual time=403.218..459.041 rows=392 loops=1)
  Buffers: shared hit=418
  -&gt;  Sort  (cost=2662.03..2729.17 rows=26854 width=37) (actual time=403.213..411.041 rows=28353 loops=1)
        Sort Key: prices.exchange, prices.pair, prices.side, prices.guaranteed_volume, prices.ts DESC
        Sort Method: quicksort  Memory: 2984kB
        Buffers: shared hit=418
        -&gt;  Seq Scan on prices  (cost=0.00..686.54 rows=26854 width=37) (actual time=0.022..31.407 rows=28353 loops=1)
              Buffers: shared hit=418
Planning time: 0.911 ms
Execution time: 460.190 ms

解釋禁用 seqscan 的分析：

# explain (analyze,buffers) select * from last_prices;
                                                                                 QUERY PLAN                                                                                  
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique  (cost=0.41..4458.07 rows=1224 width=37) (actual time=0.037..122.237 rows=392 loops=1)
  Buffers: shared hit=15182
  -&gt;  Index Scan using prices_exchange_pair_side_guaranteed_volume_ts_idx on prices  (cost=0.41..4189.53 rows=26854 width=37) (actual time=0.034..91.237 rows=29649 loops=1)
        Buffers: shared hit=15182
Planning time: 0.291 ms
Execution time: 122.417 ms

添加一個直接訪問視圖查詢的查詢：

# explain (analyze, buffers)
select distinct on (exchange, pair, side, guaranteed_volume)
      exchange
    , pair
    , side
    , guaranteed_volume
    , price
    , ts
 from prices
order by exchange
       , pair
       , side
       , guaranteed_volume
       , ts desc;
                                                      QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
Unique  (cost=2163.56..2429.99 rows=1224 width=37) (actual time=364.716..391.405 rows=380 loops=1)
  Buffers: shared hit=418
  -&gt;  Sort  (cost=2163.56..2216.85 rows=21314 width=37) (actual time=364.711..370.458 rows=24011 loops=1)
        Sort Key: exchange, pair, side, guaranteed_volume, ts DESC
        Sort Method: quicksort  Memory: 2644kB
        Buffers: shared hit=418
        -&gt;  Seq Scan on prices  (cost=0.00..631.14 rows=21314 width=37) (actual time=0.025..13.751 rows=24011 loops=1)
              Buffers: shared hit=418
Planning time: 0.258 ms
Execution time: 392.110 ms

我想快速查詢每個currency/ sideduo的最後一個值
DISTINCT ON每個感興趣的組合擅長幾行。但是您的案例顯然每個 distinct有很多(currency, side)行。DISTINCT ON就性能而言，這是一個糟糕的選擇。您將在關於 SO 的這兩個相關答案中找到詳細的評估和解決方案庫：
在每個 GROUP BY 組中選擇第一行？
優化 GROUP BY 查詢以檢索每個使用者的最新記錄
如果您只需要最新的時間戳ts，那麼該列就是排序標準和所需的返回值，情況非常簡單。看看Evan 的簡單解決方案max(ts)。
（好吧，理想情況下，您應該有一個索引(currency, side, ts desc NULLS LAST)，因為max(ts)忽略 NULL 值並更好地匹配此排序順序。但這對於定義的列並不重要NOT NULL。）
通常，您需要每個選定行中的其他列（例如目前價格！）和/或您需要按多列排序，因此您需要做更多。
理想情況下，您有另一個表列出所有貨幣 - 以及一個 FK 約束來強制引用完整性並禁止不存在的貨幣。然後使用連結答案中*“2a. LATERAL join”*一章中的查詢技術，擴展以考慮添加的：side
根據您最初的簡單測試案例：
SELECT c.currency, s.side, p.*
FROM   currency c
CROSS  JOIN (VALUES (true), (false)) s(side)  -- account for side
CROSS  JOIN LATERAL (
  SELECT ts, price              -- more columns?
  FROM   prices
  WHERE  currency = c.currency
  AND    side = s.side
  ORDER  BY ts DESC             -- ts is NOT NULL
  LIMIT  1
  ) p
ORDER  BY 1, 2;  -- optional, whatever you prefer;
您應該會看到對(currency, side, ts DESC).
如果僅索引掃描是可能的，並且您只需要ts並且price將price作為最後一列添加到索引中可能是值得的。
dbfiddle在這裡
是否將此查詢保存在 aVIEW中都不會影響性能。

引用自：https://dba.stackexchange.com/questions/202248

如何加快查詢時間序列中的最後一個值？

相關問答

子查詢中的慢左連接橫向

使用更大的運算符在 jsonb 數組中搜尋嵌套值

使用 GIN 索引位串

為什麼 ASC 比 DESC 快 100 倍，我該怎麼辦？

為什麼這個帶有 WHERE、ORDER BY 和 LIMIT 的查詢這麼慢？

過濾數組文本並按時間戳排序