帶有 WHERE 條件和 GROUP BY 的 SQL 查詢的索引

March 17, 2014

我正在嘗試確定哪些索引用於帶有WHERE條件的 SQL 查詢，並且GROUP BY目前執行速度非常慢。

我的查詢：

SELECT group_id
FROM counter
WHERE ts between timestamp '2014-03-02 00:00:00.0' and timestamp '2014-03-05 12:00:00.0'
GROUP BY group_id

該表目前有 32.000.000 行。當我增加時間範圍時，查詢的執行時間會增加很多。

有問題的表格如下所示：

CREATE TABLE counter (
   id bigserial PRIMARY KEY
 , ts timestamp NOT NULL
 , group_id bigint NOT NULL
);

我目前有以下索引，但性能仍然很慢：

CREATE INDEX ts_index
 ON counter
 USING btree
 (ts);

CREATE INDEX group_id_index
 ON counter
 USING btree
 (group_id);

CREATE INDEX comp_1_index
 ON counter
 USING btree
 (ts, group_id);

CREATE INDEX comp_2_index
 ON counter
 USING btree
 (group_id, ts);

在查詢上執行 EXPLAIN 會得到以下結果：

"QUERY PLAN"
"HashAggregate  (cost=467958.16..467958.17 rows=1 width=4)"
"  -&gt;  Index Scan using ts_index on counter  (cost=0.56..467470.93 rows=194892 width=4)"
"        Index Cond: ((ts &gt;= '2014-02-26 00:00:00'::timestamp without time zone) AND (ts &lt;= '2014-02-27 23:59:00'::timestamp without time zone))"

SQL Fiddle 範例數據：http ://sqlfiddle.com/#!15/7492b/1

問題

可以通過添加更好的索引來提高此查詢的性能，還是必須增加處理能力？

編輯 1

使用 PostgreSQL 版本 9.3.2。

編輯 2

我嘗試了@Erwin 的提議EXISTS：

SELECT group_id
FROM   groups g
WHERE  EXISTS (
  SELECT 1
  FROM   counter c
  WHERE  c.group_id = g.group_id
  AND    ts BETWEEN timestamp '2014-03-02 00:00:00'
                AND timestamp '2014-03-05 12:00:00'
  );

但不幸的是，這似乎並沒有提高性能。查詢計劃：

"QUERY PLAN"
"Nested Loop Semi Join  (cost=1607.18..371680.60 rows=113 width=4)"
"  -&gt;  Seq Scan on groups g  (cost=0.00..2.33 rows=133 width=4)"
"  -&gt;  Bitmap Heap Scan on counter c  (cost=1607.18..158895.53 rows=60641 width=4)"
"        Recheck Cond: ((group_id = g.id) AND (ts &gt;= '2014-01-01 00:00:00'::timestamp without time zone) AND (ts &lt;= '2014-03-05 12:00:00'::timestamp without time zone))"
"        -&gt;  Bitmap Index Scan on comp_2_index  (cost=0.00..1592.02 rows=60641 width=0)"
"              Index Cond: ((group_id = g.id) AND (ts &gt;= '2014-01-01 00:00:00'::timestamp without time zone) AND (ts &lt;= '2014-03-05 12:00:00'::timestamp without time zone))"

編輯 3

ypercube 的 LATERAL 查詢的查詢計劃：

"QUERY PLAN"
"Nested Loop  (cost=8.98..1200.42 rows=133 width=20)"
"  -&gt;  Seq Scan on groups g  (cost=0.00..2.33 rows=133 width=4)"
"  -&gt;  Result  (cost=8.98..8.99 rows=1 width=0)"
"        One-Time Filter: ($1 IS NOT NULL)"
"        InitPlan 1 (returns $1)"
"          -&gt;  Limit  (cost=0.56..4.49 rows=1 width=8)"
"                -&gt;  Index Only Scan using comp_2_index on counter c  (cost=0.56..1098691.21 rows=279808 width=8)"
"                      Index Cond: ((group_id = $0) AND (ts IS NOT NULL) AND (ts &gt;= '2010-03-02 00:00:00'::timestamp without time zone) AND (ts &lt;= '2014-03-05 12:00:00'::timestamp without time zone))"
"        InitPlan 2 (returns $2)"
"          -&gt;  Limit  (cost=0.56..4.49 rows=1 width=8)"
"                -&gt;  Index Only Scan Backward using comp_2_index on counter c_1  (cost=0.56..1098691.21 rows=279808 width=8)"
"                      Index Cond: ((group_id = $0) AND (ts IS NOT NULL) AND (ts &gt;= '2010-03-02 00:00:00'::timestamp without time zone) AND (ts &lt;= '2014-03-05 12:00:00'::timestamp without time zone))"

另一個想法，它也使用groups表和一個名為LATERALjoin 的結構（對於 SQL-Server 粉絲，這幾乎與相同OUTER APPLY）。它的優點是可以在子查詢中計算聚合：
SELECT group_id, min_ts, max_ts
FROM   groups g,                    -- notice the comma here, is required
 LATERAL 
      ( SELECT MIN(ts) AS min_ts,
               MAX(ts) AS max_ts
        FROM counter c
        WHERE c.group_id = g.group_id
          AND c.ts BETWEEN timestamp '2011-03-02 00:00:00'
                       AND timestamp '2013-03-05 12:00:00'
      ) x 
WHERE min_ts IS NOT NULL ;
**SQL-Fiddle**的測試表明查詢對索引進行了索引掃描(group_id, ts)。
使用 2 個橫向連接生成類似的計劃，一個用於最小連接，一個用於最大連接，還有 2 個內聯相關子查詢。counter如果您需要顯示除最小和最大日期之外的整行，也可以使用它們：
SELECT group_id, 
      min_ts, min_ts_id, 
      max_ts, max_ts_id 
FROM   groups g
 , LATERAL 
      ( SELECT ts AS min_ts, c.id AS min_ts_id
        FROM counter c
        WHERE c.group_id = g.group_id
          AND c.ts BETWEEN timestamp '2012-03-02 00:00:00'
                       AND timestamp '2014-03-05 12:00:00'
        ORDER BY ts ASC
        LIMIT 1
      ) xmin
 , LATERAL 
      ( SELECT ts AS max_ts, c.id AS max_ts_id
        FROM counter c
        WHERE c.group_id = g.group_id
          AND c.ts BETWEEN timestamp '2012-03-02 00:00:00'
                       AND timestamp '2014-03-05 12:00:00'
        ORDER BY ts DESC 
        LIMIT 1
      ) xmax
WHERE min_ts IS NOT NULL ;

引用自：https://dba.stackexchange.com/questions/60777

帶有 WHERE 條件和 GROUP BY 的 SQL 查詢的索引

問題

編輯 1

編輯 2

編輯 3

相關問答

查詢執行時間過長

如何加快選擇不同的？

Postgres 不使用 group by 的表達式索引

如何優化索引列上的 IN 查詢

如何優化提取到交叉表中

優化具有小 LIMIT 的查詢，以一列為謂詞並按另一列排序