Postgresql
帶有 WHERE 條件和 GROUP BY 的 SQL 查詢的索引
我正在嘗試確定哪些索引用於帶有
WHERE
條件的 SQL 查詢,並且GROUP BY
目前執行速度非常慢。我的查詢:
SELECT group_id FROM counter WHERE ts between timestamp '2014-03-02 00:00:00.0' and timestamp '2014-03-05 12:00:00.0' GROUP BY group_id
該表目前有 32.000.000 行。當我增加時間範圍時,查詢的執行時間會增加很多。
有問題的表格如下所示:
CREATE TABLE counter ( id bigserial PRIMARY KEY , ts timestamp NOT NULL , group_id bigint NOT NULL );
我目前有以下索引,但性能仍然很慢:
CREATE INDEX ts_index ON counter USING btree (ts); CREATE INDEX group_id_index ON counter USING btree (group_id); CREATE INDEX comp_1_index ON counter USING btree (ts, group_id); CREATE INDEX comp_2_index ON counter USING btree (group_id, ts);
在查詢上執行 EXPLAIN 會得到以下結果:
"QUERY PLAN" "HashAggregate (cost=467958.16..467958.17 rows=1 width=4)" " -> Index Scan using ts_index on counter (cost=0.56..467470.93 rows=194892 width=4)" " Index Cond: ((ts >= '2014-02-26 00:00:00'::timestamp without time zone) AND (ts <= '2014-02-27 23:59:00'::timestamp without time zone))"
SQL Fiddle 範例數據:http ://sqlfiddle.com/#!15/7492b/1
問題
可以通過添加更好的索引來提高此查詢的性能,還是必須增加處理能力?
編輯 1
使用 PostgreSQL 版本 9.3.2。
編輯 2
我嘗試了@Erwin 的提議
EXISTS
:SELECT group_id FROM groups g WHERE EXISTS ( SELECT 1 FROM counter c WHERE c.group_id = g.group_id AND ts BETWEEN timestamp '2014-03-02 00:00:00' AND timestamp '2014-03-05 12:00:00' );
但不幸的是,這似乎並沒有提高性能。查詢計劃:
"QUERY PLAN" "Nested Loop Semi Join (cost=1607.18..371680.60 rows=113 width=4)" " -> Seq Scan on groups g (cost=0.00..2.33 rows=133 width=4)" " -> Bitmap Heap Scan on counter c (cost=1607.18..158895.53 rows=60641 width=4)" " Recheck Cond: ((group_id = g.id) AND (ts >= '2014-01-01 00:00:00'::timestamp without time zone) AND (ts <= '2014-03-05 12:00:00'::timestamp without time zone))" " -> Bitmap Index Scan on comp_2_index (cost=0.00..1592.02 rows=60641 width=0)" " Index Cond: ((group_id = g.id) AND (ts >= '2014-01-01 00:00:00'::timestamp without time zone) AND (ts <= '2014-03-05 12:00:00'::timestamp without time zone))"
編輯 3
ypercube 的 LATERAL 查詢的查詢計劃:
"QUERY PLAN" "Nested Loop (cost=8.98..1200.42 rows=133 width=20)" " -> Seq Scan on groups g (cost=0.00..2.33 rows=133 width=4)" " -> Result (cost=8.98..8.99 rows=1 width=0)" " One-Time Filter: ($1 IS NOT NULL)" " InitPlan 1 (returns $1)" " -> Limit (cost=0.56..4.49 rows=1 width=8)" " -> Index Only Scan using comp_2_index on counter c (cost=0.56..1098691.21 rows=279808 width=8)" " Index Cond: ((group_id = $0) AND (ts IS NOT NULL) AND (ts >= '2010-03-02 00:00:00'::timestamp without time zone) AND (ts <= '2014-03-05 12:00:00'::timestamp without time zone))" " InitPlan 2 (returns $2)" " -> Limit (cost=0.56..4.49 rows=1 width=8)" " -> Index Only Scan Backward using comp_2_index on counter c_1 (cost=0.56..1098691.21 rows=279808 width=8)" " Index Cond: ((group_id = $0) AND (ts IS NOT NULL) AND (ts >= '2010-03-02 00:00:00'::timestamp without time zone) AND (ts <= '2014-03-05 12:00:00'::timestamp without time zone))"
另一個想法,它也使用
groups
表和一個名為LATERAL
join 的結構(對於 SQL-Server 粉絲,這幾乎與 相同OUTER APPLY
)。它的優點是可以在子查詢中計算聚合:SELECT group_id, min_ts, max_ts FROM groups g, -- notice the comma here, is required LATERAL ( SELECT MIN(ts) AS min_ts, MAX(ts) AS max_ts FROM counter c WHERE c.group_id = g.group_id AND c.ts BETWEEN timestamp '2011-03-02 00:00:00' AND timestamp '2013-03-05 12:00:00' ) x WHERE min_ts IS NOT NULL ;
**SQL-Fiddle**的測試表明查詢對索引進行了索引掃描
(group_id, ts)
。使用 2 個橫向連接生成類似的計劃,一個用於最小連接,一個用於最大連接,還有 2 個內聯相關子查詢。
counter
如果您需要顯示除最小和最大日期之外的整行,也可以使用它們:SELECT group_id, min_ts, min_ts_id, max_ts, max_ts_id FROM groups g , LATERAL ( SELECT ts AS min_ts, c.id AS min_ts_id FROM counter c WHERE c.group_id = g.group_id AND c.ts BETWEEN timestamp '2012-03-02 00:00:00' AND timestamp '2014-03-05 12:00:00' ORDER BY ts ASC LIMIT 1 ) xmin , LATERAL ( SELECT ts AS max_ts, c.id AS max_ts_id FROM counter c WHERE c.group_id = g.group_id AND c.ts BETWEEN timestamp '2012-03-02 00:00:00' AND timestamp '2014-03-05 12:00:00' ORDER BY ts DESC LIMIT 1 ) xmax WHERE min_ts IS NOT NULL ;