Postgresql

在某個時間點查找 n 個加油站的目前價格

  • May 3, 2020

我有一個表,其中儲存了價格資訊,大約 1300 萬行儲存在 PostgreSQL 9.5 數據庫中。

CREATE TABLE public.de_tt_priceinfo (
 id integer NOT NULL DEFAULT nextval('priceinfo_id_seq'::regclass),
 station_id character varying(60),
 received timestamp with time zone NOT NULL DEFAULT now(),
 e5 numeric(4,3),
 e10 numeric(4,3),
 diesel numeric(4,3),
 CONSTRAINT de_tt_priceinfo_id_pkey PRIMARY KEY (id)
);

CREATE INDEX de_tt_priceinfo_received_station_id_idx
 ON public.de_tt_priceinfo (received, station_id COLLATE pg_catalog."default");

CREATE INDEX index_station_id
 ON public.de_tt_priceinfo (station_id COLLATE pg_catalog."default");

從這個表中,我需要以最高性能提取某個時間點的最新價格,因為我必須模擬 3200 萬查詢該表的通勤者(不是一次,而是仍然)。

我有一個有效的查詢!

SELECT station_id, e5, e10, diesel, received FROM de_tt_priceinfo a 
WHERE a.received = (SELECT MAX(received) FROM de_tt_priceinfo b 
 WHERE a.station_id = b.station_id
 AND  received <= '2014-09-25 08:45:12'::TIMESTAMPTZ)
AND station_id IN('0C91A93A-a-b-c-d', '578C44BB-a-b-c-d', '6F2F48A8-a-b-c-d'
               , '9982BE74-a-b-c-d', 'A24C612B-a-b-c-d', 'BEC3EF55-a-b-c-d'
               , 'F5137488-a-b-c-d')

此查詢的性能不可用。執行時間在 900 毫秒左右變化。結果看起來像這樣

0C91A93A-abcd,1.xxx,1.xxx,1.xxx,“2014-09-25 08:17:50.000000”
578C44BB-abcd,1.xxx,1.xxx,1.xxx,“2014-09-25 08:00:09.000000”
6F2F48A8-abcd,1.xxx,1.xxx,1.xxx,“2014-09-25 07:08:57.000000”
9982BE74-abcd,1.xxx,1.xxx,1.xxx,“2014-09-25 08:29:55.000000”
A24C612B-abcd,1.xxx,1.xxx,1.xxx,“2014-09-25 08:00:09.000000”
BEC3EF55-abcd,1.xxx,1.xxx,1.xxx,“2014-09-25 06:53:49.000000”
F5137488-abcd,1.xxx,1.xxx,1.xxx,“2014-09-25 07:44:55.000000”

所以我搜尋了一下,發現了一些流行詞,比如遞歸 CTE、鬆散的索引掃描和一些 DBA 上的答案似乎非常接近,但我無法根據我的需要修改它們。

如果我理解正確,遞歸 CTE 將是查詢我想要的數據的最快方法。

到目前為止,我得到的是:

WITH RECURSIVE cte AS (
  (
  SELECT station_id, e5, e10, diesel, received
  FROM   de_tt_priceinfo
  WHERE  received <= '2014-09-25 08:45:00'::TIMESTAMPTZ
    AND station_id IN('0C91A93A-a-b-c-d', '578C44BB-a-b-c-d', '6F2F48A8-a-b-c-d'
                    , '9982BE74-a-b-c-d', 'A24C612B-a-b-c-d', 'BEC3EF55-a-b-c-d'
                    , 'F5137488-a-b-c-d')
  ORDER  BY station_id, received DESC NULLS LAST
  LIMIT 1  
  )
  UNION ALL
  (
  SELECT u.station_id, u.e5, u.e10, u.diesel, u.received
  FROM   cte c
  JOIN de_tt_priceinfo u ON u.received > c.received   
  WHERE  u.received <= '2014-09-25 08:45:00'::TIMESTAMPTZ  -- repeat condition!
    AND u.station_id IN('0C91A93A-a-b-c-d', '578C44BB-a-b-c-d', '6F2F48A8-a-b-c-d'
                      , '9982BE74-a-b-c-d', 'A24C612B-a-b-c-d', 'BEC3EF55-a-b-c-d'
                      , 'F5137488-a-b-c-d')
  ORDER BY u.station_id, u.received DESC NULLS LAST LIMIT 1
  )
  )
SELECT * FROM cte;

但這只是返回以下兩行:

0C91A93A-a-b-c-d, 1.xxx, 1.xxx, 1.xxx, "2014-09-25 08:17:50.000000"
9982BE74-a-b-c-d, 1.xxx, 1.xxx, 1.xxx, "2014-09-25 08:29:55.000000"

更新:

  • SELECT Version();x86_64-pc-linux-gnu 上的 PostgreSQL 9.5devel,由 x86_64-pc-linux-gnu-gcc (Gentoo 4.8.3 p1.1, pie-0.5.9) 4.8.3, 64-bit 編譯
  • 解釋分析:http ://explain.depesz.com/s/clrZ
  • XEON 1231v3、16 GB 記憶體、三星 840 PRO SSD
  • 對預設 postgresql.conf 的更改
# 聯繫
聽地址 = '*'
最大連接數 = 16

# 記錄
log_destination = 'csvlog'
log_directory = 'pg_log'
logging_collector = on
log_filename = 'postgres-%Y-%m-%d_%H%M%S.log'
log_rotation_age = 1d
log_rotation_size = 1GB
log_min_duration_statement = 500 毫秒
#log_checkpoints = 開啟
#log_connections = 開啟
#log_disconnections = 開啟
log_lock_waits = on
#log_temp_files = 0

# 記憶
shared_buffers = 1GB
臨時緩衝區 = 32MB
工作記憶體 = 256MB
維護工作記憶體 = 1GB
有效記憶體大小 = 8GB

# 檢查點(何時寫入磁碟)
wal_buffers = 16MB
checkpoint_completion_target = 0.9
checkpoint_timeout = 30 分鐘
checkpoint_segments = 32

random_page_cost = 1.1

# 只導入!
#autovacuum = 關閉
fsync = 關閉
同步送出 = 關閉
full_page_writes = 關閉

指數

首先,對於您的查詢類型,這是更好的索引:

CREATE INDEX de_tt_priceinfo_received_station_id_idx
 ON public.de_tt_priceinfo (**station_id, received**);  -- note the reversed order

由於組合應該是唯一的(我假設),我建議改為UNIQUE限制:(station_id, receved)

ALTER TABLE de_tt_priceinfo ADD CONSTRAINT de_tt_priceinfo_station_id_received
UNIQUE (station_id, received);

該索引index_station_id大部分已被取代,現在可能會被刪除。

該索引de_tt_priceinfo_received_station_id_idx可能仍然有用。

一定要理解這一切背後的邏輯:

詢問

我還會考慮基本DISTINCT ON查詢:

SELECT DISTINCT ON (station_id)
      station_id, e5, e10, diesel, received
FROM   de_tt_priceinfo
WHERE  received <= '2014-09-25 08:45:12'::TIMESTAMPTZ
AND    station_id = ANY ('{0C91A93A-a-b-c-d, 578C44BB-a-b-c-d, 6F2F48A8-a-b-c-d
                        , 9982BE74-a-b-c-d, A24C612B-a-b-c-d, BEC3EF55-a-b-c-d
                        , F5137488-a-b-c-d}'::varchar[])
ORDER BY station_id, received DESC;

但是由於您似乎每個站有很多行,所以這不會發光。反而:

SELECT *
FROM  (
  VALUES
    ('0C91A93A-a-b-c-d'::varchar)
   , ('578C44BB-a-b-c-d')
   , ('6F2F48A8-a-b-c-d')
   , ('9982BE74-a-b-c-d')
   , ('A24C612B-a-b-c-d')
   , ('BEC3EF55-a-b-c-d')
   , ('F5137488-a-b-c-d')
  ) s(station_id)
LEFT JOIN LATERAL (
   SELECT e5, e10, diesel, received
   FROM   de_tt_priceinfo
   WHERE  station_id = s.station_id
   AND    received <= '2014-09-25 08:45:12'::TIMESTAMPTZ
   ORDER  BY received DESC
   LIMIT  1
  )  p ON TRUE

這應該是與上述UNIQUE約束(或等效索引)相結合的炸藥。

詳細解釋:

表定義

對於具有數百萬行的表,優化儲存是值得的。讓一切變得更小更快。

我就是這樣設計的:

CREATE TABLE station (
  station_id serial PRIMARY KEY
, station    text
, CHECK (length(station) < 61) -- ?? optional, you decide 
);

CREATE TABLE priceinfo (
  priceinfo_id serial PRIMARY KEY
, station_id   integer NOT NULL REFERENCES station ON UPDATE CASCADE
, received     timestamptz NOT NULL DEFAULT now(),
, e5           integer  -- price in 0.1 Cent
, e10          integer  -- price in 0.1 Cent
, diesel       integer  -- price in 0.1 Cent
, CONSTRAINT priceinfo_station_id_received UNIQUE (station_id, received)
);

CREATE INDEX priceinfo_received_idx ON public.priceinfo (received);

與原始表中的94 字節(24 + 66 + 4)相比,其中的行大小priceinfo60 字節(24 堆元組標頭 + 空點陣圖;32 字節數據;4 字節項目標識符)。****假設您的範例中有 16 個字元的字元串。一切都將小 36 %(或更多?)並且速度更快。

每個索引元組的關鍵索引(station_id, received)減少到8 個字節的數據,而不是32 個字節甚至更多(!) - 每個都加上成本。此外,處理integer數字 forstation_id通常比處理帶有 a 的文本要快COLLATION

細節:

查詢將首先station_idstation表中獲取,這很便宜。

價格儲存為integer表示 0.1 美分的數字。(原始的4 字節而不是10 字節numeric(4,3)乘以 0.1 得到 Cent 或 0.001 得到 € 用於顯示。非常簡單和快速。

UUID

錯誤消息中的字元串看起來相當長,實際上就像一個正常UUID數字:

871828b4-37e5-419c-b7a5-cdbe1e1c0148

如果是這樣,請使用uuid數據類型。無論你採用我的保持你的舊的設計。至少切換到uuid數據類型,以便在各個方面都有很大的整體收益:

引用自:https://dba.stackexchange.com/questions/93166