Postgresql
優化視圖(和基礎表)以將時間戳平均到小時
我有這張桌子:
CREATE TABLE spp.rtprices ( "interval" timestamp without time zone NOT NULL, rtlmp numeric(12,6), rtmcc numeric(12,6), rtmcl numeric(12,6), node_id integer NOT NULL, CONSTRAINT rtprices_pkey PRIMARY KEY ("interval", node_id), CONSTRAINT rtprices_node_id_fkey FOREIGN KEY (node_id) REFERENCES spp.nodes (node_id) MATCH SIMPLE ON UPDATE RESTRICT ON DELETE RESTRICT )
還有一個相關的索引:
CREATE INDEX rtprices_node_id_interval_idx ON spp.rtprices (node_id, "interval");
反對它,我提出了這樣的觀點:
CREATE OR REPLACE VIEW spp.rtprices_hourly AS SELECT (rtprices."interval" - '00:05:00'::interval)::date::timestamp without time zone AS pricedate, date_part('hour'::text, date_trunc('hour'::text, rtprices."interval" - '00:05:00'::interval))::integer + 1 AS hour, rtprices.node_id, round(avg(rtprices.rtlmp), 2) AS rtlmp, round(avg(rtprices.rtmcc), 2) AS rtmcc, round(avg(rtprices.rtmcl), 2) AS rtmcl FROM spp.rtprices GROUP BY date_part('hour'::text, date_trunc('hour'::text, rtprices."interval" - '00:05:00'::interval))::integer + 1, rtprices.node_id, (rtprices."interval" - '00:05:00'::interval)::date::timestamp without time zone;
重點是給出每小時數字列的平均值(時間戳每 5 分鐘有一次數據)。問題是單日查詢
node_id
需要 30 秒以上才能查詢 24 條記錄。explain analyze select * from spp.rtprices_hourly where node_id=20 and pricedate='2015-02-02'
返回這個:
"HashAggregate (cost=1128767.71..1128773.79 rows=135 width=28) (actual time=31155.023..31155.065 rows=24 loops=1)" " Group Key: ((date_part('hour'::text, date_trunc('hour'::text, (rtprices."interval" - '00:05:00'::interval))))::integer + 1), rtprices.node_id, (((rtprices."interval" - '00:05:00'::interval))::date)::timestamp without time zone" " -> Bitmap Heap Scan on rtprices (cost=10629.42..1128732.91 rows=2320 width=28) (actual time=25071.410..31153.715 rows=288 loops=1)" " Recheck Cond: (node_id = 20)" " Rows Removed by Index Recheck: 7142233" " Filter: (((("interval" - '00:05:00'::interval))::date)::timestamp without time zone = '2015-02-02 00:00:00'::timestamp without time zone)" " Rows Removed by Filter: 124909" " Heap Blocks: exact=43076 lossy=82085" " -> Bitmap Index Scan on rtprices_node_id_interval_idx (cost=0.00..10628.84 rows=464036 width=0) (actual time=68.999..68.999 rows=125197 loops=1)" " Index Cond: (node_id = 20)" "Planning time: 5.243 ms" "Execution time: 31155.392 ms"
更簡單的視圖
為此目標:
重點是給出每小時數字列的平均值
.. 截斷到完整小時似乎同樣好,這更簡單、更便宜:
CREATE OR REPLACE VIEW spp.rtprices_hourly AS SELECT date_trunc('hour', "interval") AS hour , node_id , round(avg(rtlmp), 2) AS rtlmp , round(avg(rtmcc), 2) AS rtmcc , round(avg(rtmcl), 2) AS rtmcl FROM spp.rtprices GROUP BY 1, 2;
更快的查詢
無論哪種方式,具有可搜尋謂詞的視圖上的等效查詢將是:
SELECT * FROM spp.rtprices_hourly WHERE node_id = 20 AND hour >= '2015-02-02 0:0'::timestamp AND hour < '2015-02-03 0:0'::timestamp;
這速度更快,但仍然沒有達到應有的速度。主要的性能損失是因為索引只能與索引條件on一起使用,該條件
node_id
在視圖中保留為其原始狀態。這就是為什麼你的索引rtprices_node_id_interval_idx
很node_id
重要的原因。為什麼?在從堆中獲取元組之後(已從表中讀取行)
hour
,必須過濾第二個謂詞。大部分行在流程後期被丟棄,很多工作都是徒勞的。直接查詢更快
在聚合之前執行原始查詢並應用謂詞會快得多:
SELECT date_trunc('hour', "interval") AS hour , node_id , round(avg(rtlmp), 2) AS rtlmp , round(avg(rtmcc), 2) AS rtmcc , round(avg(rtmcl), 2) AS rtmcl FROM spp.rtprices WHERE node_id = 20 AND "interval" >= '2015-02-02 0:0'::timestamp AND "interval" < '2015-02-03 0:0'::timestamp GROUP BY 1, 2;
您現在將看到所有謂詞的索引條件。更有效的索引仍然是
node_id
第一個。為什麼?快速和簡短:創建一個函式
因此,這不適用於視圖。改用函式:
CREATE OR REPLACE FUNCTION rtprices_hourly(_node_id int , _from timestamp , _to timestamp = NULL) RETURNS TABLE ( hour timestamp , node_id int , rtlmp numeric , rtmcc numeric , rtmcl numeric) AS $func$ SELECT date_trunc('hour', r."interval") -- AS hour , r.node_id , round(avg(r.rtlmp), 2) -- AS rtlmp , round(avg(r.rtmcc), 2) -- AS rtmcc , round(avg(r.rtmcl), 2) -- AS rtmcl FROM spp.rtprices r WHERE r.node_id = _node_id AND r."interval" >= _from AND r."interval" < COALESCE(_to, _from + interval '1 day') GROUP BY 1, 2 $func$ LANGUAGE sql STABLE;
- 注意 OUT 參數和列名之間的命名衝突。這就是我在這裡對所有列進行表格限定的原因。
現在您可以通過一個簡單的查詢獲得最佳性能:
SELECT * FROM rtprices_hourly(1, '2015-2-2 0:0'::timestamp, '2015-2-3 0:0'::timestamp);
我添加了一個便利功能,如果省略第二個參數,則預設為“一天后”:
SELECT * FROM rtprices_hourly(1, '2015-2-2 0:0'::timestamp);
有關函式參數和預設值的更多資訊:
您可以查詢任何範圍:
SELECT * FROM rtprices_hourly(1, '2015-2-2 10:0'::timestamp, '2015-2-2 20:0'::timestamp);