在某個時間點查找 n 個加油站的目前價格
我有一個表,其中儲存了價格資訊,大約 1300 萬行儲存在 PostgreSQL 9.5 數據庫中。
CREATE TABLE public.de_tt_priceinfo ( id integer NOT NULL DEFAULT nextval('priceinfo_id_seq'::regclass), station_id character varying(60), received timestamp with time zone NOT NULL DEFAULT now(), e5 numeric(4,3), e10 numeric(4,3), diesel numeric(4,3), CONSTRAINT de_tt_priceinfo_id_pkey PRIMARY KEY (id) ); CREATE INDEX de_tt_priceinfo_received_station_id_idx ON public.de_tt_priceinfo (received, station_id COLLATE pg_catalog."default"); CREATE INDEX index_station_id ON public.de_tt_priceinfo (station_id COLLATE pg_catalog."default");
從這個表中,我需要以最高性能提取某個時間點的最新價格,因為我必須模擬 3200 萬查詢該表的通勤者(不是一次,而是仍然)。
我有一個有效的查詢!
SELECT station_id, e5, e10, diesel, received FROM de_tt_priceinfo a WHERE a.received = (SELECT MAX(received) FROM de_tt_priceinfo b WHERE a.station_id = b.station_id AND received <= '2014-09-25 08:45:12'::TIMESTAMPTZ) AND station_id IN('0C91A93A-a-b-c-d', '578C44BB-a-b-c-d', '6F2F48A8-a-b-c-d' , '9982BE74-a-b-c-d', 'A24C612B-a-b-c-d', 'BEC3EF55-a-b-c-d' , 'F5137488-a-b-c-d')
此查詢的性能不可用。執行時間在 900 毫秒左右變化。結果看起來像這樣
0C91A93A-abcd,1.xxx,1.xxx,1.xxx,“2014-09-25 08:17:50.000000” 578C44BB-abcd,1.xxx,1.xxx,1.xxx,“2014-09-25 08:00:09.000000” 6F2F48A8-abcd,1.xxx,1.xxx,1.xxx,“2014-09-25 07:08:57.000000” 9982BE74-abcd,1.xxx,1.xxx,1.xxx,“2014-09-25 08:29:55.000000” A24C612B-abcd,1.xxx,1.xxx,1.xxx,“2014-09-25 08:00:09.000000” BEC3EF55-abcd,1.xxx,1.xxx,1.xxx,“2014-09-25 06:53:49.000000” F5137488-abcd,1.xxx,1.xxx,1.xxx,“2014-09-25 07:44:55.000000”
所以我搜尋了一下,發現了一些流行詞,比如遞歸 CTE、鬆散的索引掃描和一些 DBA 上的答案似乎非常接近,但我無法根據我的需要修改它們。
如果我理解正確,遞歸 CTE 將是查詢我想要的數據的最快方法。
到目前為止,我得到的是:
WITH RECURSIVE cte AS ( ( SELECT station_id, e5, e10, diesel, received FROM de_tt_priceinfo WHERE received <= '2014-09-25 08:45:00'::TIMESTAMPTZ AND station_id IN('0C91A93A-a-b-c-d', '578C44BB-a-b-c-d', '6F2F48A8-a-b-c-d' , '9982BE74-a-b-c-d', 'A24C612B-a-b-c-d', 'BEC3EF55-a-b-c-d' , 'F5137488-a-b-c-d') ORDER BY station_id, received DESC NULLS LAST LIMIT 1 ) UNION ALL ( SELECT u.station_id, u.e5, u.e10, u.diesel, u.received FROM cte c JOIN de_tt_priceinfo u ON u.received > c.received WHERE u.received <= '2014-09-25 08:45:00'::TIMESTAMPTZ -- repeat condition! AND u.station_id IN('0C91A93A-a-b-c-d', '578C44BB-a-b-c-d', '6F2F48A8-a-b-c-d' , '9982BE74-a-b-c-d', 'A24C612B-a-b-c-d', 'BEC3EF55-a-b-c-d' , 'F5137488-a-b-c-d') ORDER BY u.station_id, u.received DESC NULLS LAST LIMIT 1 ) ) SELECT * FROM cte;
但這只是返回以下兩行:
0C91A93A-a-b-c-d, 1.xxx, 1.xxx, 1.xxx, "2014-09-25 08:17:50.000000" 9982BE74-a-b-c-d, 1.xxx, 1.xxx, 1.xxx, "2014-09-25 08:29:55.000000"
更新:
SELECT Version();
x86_64-pc-linux-gnu 上的 PostgreSQL 9.5devel,由 x86_64-pc-linux-gnu-gcc (Gentoo 4.8.3 p1.1, pie-0.5.9) 4.8.3, 64-bit 編譯- 解釋分析:http ://explain.depesz.com/s/clrZ
- XEON 1231v3、16 GB 記憶體、三星 840 PRO SSD
- 對預設 postgresql.conf 的更改
# 聯繫 聽地址 = '*' 最大連接數 = 16 # 記錄 log_destination = 'csvlog' log_directory = 'pg_log' logging_collector = on log_filename = 'postgres-%Y-%m-%d_%H%M%S.log' log_rotation_age = 1d log_rotation_size = 1GB log_min_duration_statement = 500 毫秒 #log_checkpoints = 開啟 #log_connections = 開啟 #log_disconnections = 開啟 log_lock_waits = on #log_temp_files = 0 # 記憶 shared_buffers = 1GB 臨時緩衝區 = 32MB 工作記憶體 = 256MB 維護工作記憶體 = 1GB 有效記憶體大小 = 8GB # 檢查點(何時寫入磁碟) wal_buffers = 16MB checkpoint_completion_target = 0.9 checkpoint_timeout = 30 分鐘 checkpoint_segments = 32 random_page_cost = 1.1 # 只導入! #autovacuum = 關閉 fsync = 關閉 同步送出 = 關閉 full_page_writes = 關閉
指數
首先,對於您的查詢類型,這是更好的索引:
CREATE INDEX de_tt_priceinfo_received_station_id_idx ON public.de_tt_priceinfo (**station_id, received**); -- note the reversed order
由於組合應該是唯一的(我假設),我建議改為
UNIQUE
限制:(station_id, receved)
ALTER TABLE de_tt_priceinfo ADD CONSTRAINT de_tt_priceinfo_station_id_received UNIQUE (station_id, received);
該索引
index_station_id
大部分已被取代,現在可能會被刪除。該索引
de_tt_priceinfo_received_station_id_idx
可能仍然有用。一定要理解這一切背後的邏輯:
詢問
我還會考慮基本
DISTINCT ON
查詢:SELECT DISTINCT ON (station_id) station_id, e5, e10, diesel, received FROM de_tt_priceinfo WHERE received <= '2014-09-25 08:45:12'::TIMESTAMPTZ AND station_id = ANY ('{0C91A93A-a-b-c-d, 578C44BB-a-b-c-d, 6F2F48A8-a-b-c-d , 9982BE74-a-b-c-d, A24C612B-a-b-c-d, BEC3EF55-a-b-c-d , F5137488-a-b-c-d}'::varchar[]) ORDER BY station_id, received DESC;
但是由於您似乎每個站有很多行,所以這不會發光。反而:
SELECT * FROM ( VALUES ('0C91A93A-a-b-c-d'::varchar) , ('578C44BB-a-b-c-d') , ('6F2F48A8-a-b-c-d') , ('9982BE74-a-b-c-d') , ('A24C612B-a-b-c-d') , ('BEC3EF55-a-b-c-d') , ('F5137488-a-b-c-d') ) s(station_id) LEFT JOIN LATERAL ( SELECT e5, e10, diesel, received FROM de_tt_priceinfo WHERE station_id = s.station_id AND received <= '2014-09-25 08:45:12'::TIMESTAMPTZ ORDER BY received DESC LIMIT 1 ) p ON TRUE
這應該是與上述
UNIQUE
約束(或等效索引)相結合的炸藥。詳細解釋:
表定義
對於具有數百萬行的表,優化儲存是值得的。讓一切變得更小更快。
我就是這樣設計的:
CREATE TABLE station ( station_id serial PRIMARY KEY , station text , CHECK (length(station) < 61) -- ?? optional, you decide ); CREATE TABLE priceinfo ( priceinfo_id serial PRIMARY KEY , station_id integer NOT NULL REFERENCES station ON UPDATE CASCADE , received timestamptz NOT NULL DEFAULT now(), , e5 integer -- price in 0.1 Cent , e10 integer -- price in 0.1 Cent , diesel integer -- price in 0.1 Cent , CONSTRAINT priceinfo_station_id_received UNIQUE (station_id, received) ); CREATE INDEX priceinfo_received_idx ON public.priceinfo (received);
與原始表中的94 字節(24 + 66 + 4)相比,其中的行大小
priceinfo
為60 字節(24 堆元組標頭 + 空點陣圖;32 字節數據;4 字節項目標識符)。****假設您的範例中有 16 個字元的字元串。一切都將小 36 %(或更多?)並且速度更快。每個索引元組的關鍵索引
(station_id, received)
減少到8 個字節的數據,而不是32 個字節甚至更多(!) - 每個都加上成本。此外,處理integer
數字 forstation_id
通常比處理帶有 a 的文本要快COLLATION
。細節:
查詢將首先
station_id
從station
表中獲取,這很便宜。價格儲存為
integer
表示 0.1 美分的數字。(原始的4 字節而不是10 字節numeric(4,3)
乘以 0.1 得到 Cent 或 0.001 得到 € 用於顯示。非常簡單和快速。
UUID
錯誤消息中的字元串看起來相當長,實際上就像一個正常
UUID
數字:871828b4-37e5-419c-b7a5-cdbe1e1c0148
如果是這樣,請使用
uuid
數據類型。無論你採用我的保持你的舊的設計。至少切換到uuid
數據類型,以便在各個方面都有很大的整體收益: