Postgresql

帶有 WHERE 條件和 GROUP BY 的 SQL 查詢的索引

  • March 17, 2014

我正在嘗試確定哪些索引用於帶有WHERE條件的 SQL 查詢,並且GROUP BY目前執行速度非常慢。

我的查詢:

SELECT group_id
FROM counter
WHERE ts between timestamp '2014-03-02 00:00:00.0' and timestamp '2014-03-05 12:00:00.0'
GROUP BY group_id

該表目前有 32.000.000 行。當我增加時間範圍時,查詢的執行時間會增加很多。

有問題的表格如下所示:

CREATE TABLE counter (
   id bigserial PRIMARY KEY
 , ts timestamp NOT NULL
 , group_id bigint NOT NULL
);

我目前有以下索引,但性能仍然很慢:

CREATE INDEX ts_index
 ON counter
 USING btree
 (ts);

CREATE INDEX group_id_index
 ON counter
 USING btree
 (group_id);

CREATE INDEX comp_1_index
 ON counter
 USING btree
 (ts, group_id);

CREATE INDEX comp_2_index
 ON counter
 USING btree
 (group_id, ts);

在查詢上執行 EXPLAIN 會得到以下結果:

"QUERY PLAN"
"HashAggregate  (cost=467958.16..467958.17 rows=1 width=4)"
"  ->  Index Scan using ts_index on counter  (cost=0.56..467470.93 rows=194892 width=4)"
"        Index Cond: ((ts >= '2014-02-26 00:00:00'::timestamp without time zone) AND (ts <= '2014-02-27 23:59:00'::timestamp without time zone))"

SQL Fiddle 範例數據:http ://sqlfiddle.com/#!15/7492b/1

問題

可以通過添加更好的索引來提高此查詢的性能,還是必須增加處理能力?

編輯 1

使用 PostgreSQL 版本 9.3.2。

編輯 2

我嘗試了@Erwin 的提議EXISTS

SELECT group_id
FROM   groups g
WHERE  EXISTS (
  SELECT 1
  FROM   counter c
  WHERE  c.group_id = g.group_id
  AND    ts BETWEEN timestamp '2014-03-02 00:00:00'
                AND timestamp '2014-03-05 12:00:00'
  );

但不幸的是,這似乎並沒有提高性能。查詢計劃:

"QUERY PLAN"
"Nested Loop Semi Join  (cost=1607.18..371680.60 rows=113 width=4)"
"  ->  Seq Scan on groups g  (cost=0.00..2.33 rows=133 width=4)"
"  ->  Bitmap Heap Scan on counter c  (cost=1607.18..158895.53 rows=60641 width=4)"
"        Recheck Cond: ((group_id = g.id) AND (ts >= '2014-01-01 00:00:00'::timestamp without time zone) AND (ts <= '2014-03-05 12:00:00'::timestamp without time zone))"
"        ->  Bitmap Index Scan on comp_2_index  (cost=0.00..1592.02 rows=60641 width=0)"
"              Index Cond: ((group_id = g.id) AND (ts >= '2014-01-01 00:00:00'::timestamp without time zone) AND (ts <= '2014-03-05 12:00:00'::timestamp without time zone))"

編輯 3

ypercube 的 LATERAL 查詢的查詢計劃:

"QUERY PLAN"
"Nested Loop  (cost=8.98..1200.42 rows=133 width=20)"
"  ->  Seq Scan on groups g  (cost=0.00..2.33 rows=133 width=4)"
"  ->  Result  (cost=8.98..8.99 rows=1 width=0)"
"        One-Time Filter: ($1 IS NOT NULL)"
"        InitPlan 1 (returns $1)"
"          ->  Limit  (cost=0.56..4.49 rows=1 width=8)"
"                ->  Index Only Scan using comp_2_index on counter c  (cost=0.56..1098691.21 rows=279808 width=8)"
"                      Index Cond: ((group_id = $0) AND (ts IS NOT NULL) AND (ts >= '2010-03-02 00:00:00'::timestamp without time zone) AND (ts <= '2014-03-05 12:00:00'::timestamp without time zone))"
"        InitPlan 2 (returns $2)"
"          ->  Limit  (cost=0.56..4.49 rows=1 width=8)"
"                ->  Index Only Scan Backward using comp_2_index on counter c_1  (cost=0.56..1098691.21 rows=279808 width=8)"
"                      Index Cond: ((group_id = $0) AND (ts IS NOT NULL) AND (ts >= '2010-03-02 00:00:00'::timestamp without time zone) AND (ts <= '2014-03-05 12:00:00'::timestamp without time zone))"

另一個想法,它也使用groups表和一個名為LATERALjoin 的結構(對於 SQL-Server 粉絲,這幾乎與 相同OUTER APPLY)。它的優點是可以在子查詢中計算聚合:

SELECT group_id, min_ts, max_ts
FROM   groups g,                    -- notice the comma here, is required
 LATERAL 
      ( SELECT MIN(ts) AS min_ts,
               MAX(ts) AS max_ts
        FROM counter c
        WHERE c.group_id = g.group_id
          AND c.ts BETWEEN timestamp '2011-03-02 00:00:00'
                       AND timestamp '2013-03-05 12:00:00'
      ) x 
WHERE min_ts IS NOT NULL ;

**SQL-Fiddle**的測試表明查詢對索引進行了索引掃描(group_id, ts)

使用 2 個橫向連接生成類似的計劃,一個用於最小連接,一個用於最大連接,還有 2 個內聯相關子查詢。counter如果您需要顯示除最小和最大日期之外的整行,也可以使用它們:

SELECT group_id, 
      min_ts, min_ts_id, 
      max_ts, max_ts_id 
FROM   groups g
 , LATERAL 
      ( SELECT ts AS min_ts, c.id AS min_ts_id
        FROM counter c
        WHERE c.group_id = g.group_id
          AND c.ts BETWEEN timestamp '2012-03-02 00:00:00'
                       AND timestamp '2014-03-05 12:00:00'
        ORDER BY ts ASC
        LIMIT 1
      ) xmin
 , LATERAL 
      ( SELECT ts AS max_ts, c.id AS max_ts_id
        FROM counter c
        WHERE c.group_id = g.group_id
          AND c.ts BETWEEN timestamp '2012-03-02 00:00:00'
                       AND timestamp '2014-03-05 12:00:00'
        ORDER BY ts DESC 
        LIMIT 1
      ) xmax
WHERE min_ts IS NOT NULL ;

引用自:https://dba.stackexchange.com/questions/60777