索引可以通過 GROUP BY/聚合對整個表(無選擇性)加速查詢嗎?
假設我有一個包含 3 列 a、b 和 c 的表。
我可以通過使用索引來加快看起來像這樣的查詢嗎?
SELECT a,b,SUM(c) # or AVG(c) FROM table GROUP BY a,b ORDER BY a,b ;
如果上述問題是肯定的,您推薦哪種類型的索引以及****它如何工作?
不見得。
GROUP BY
並且ORDER BY
通常需要排序。但是,在這種情況下,HashAggregate
使用了 a(可能是因為我們正在處理整個表)。CREATE TABLE foo AS SELECT x % 5 AS a, x % 10 AS b, x AS c FROM generate_series(1,1e6) AS x;
使用 HashAggregate 計劃,
# EXPLAIN ANALYZE SELECT a,b,sum(c) FROM foo GROUP BY a,b ORDER BY a,b; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Sort (cost=23668.04..23668.16 rows=50 width=14) (actual time=611.607..611.608 rows=10 loops=1) Sort Key: a, b Sort Method: quicksort Memory: 25kB -> HashAggregate (cost=23666.00..23666.62 rows=50 width=14) (actual time=611.589..611.593 rows=10 loops=1) Group Key: a, b -> Seq Scan on foo (cost=0.00..16166.00 rows=1000000 width=14) (actual time=0.012..71.157 rows=1000000 loops=1) Planning time: 0.168 ms Execution time: 611.665 ms
所以我們添加一個索引…
CREATE INDEX idx ON foo (a,b); VACUUM FULL ANALYZE foo;
…仍然顯示相同的查詢計劃。所以我們禁用 HashAggregate
SET enable_hashagg = false;
然後再試一次..
# EXPLAIN ANALYZE SELECT a,b,sum(c) FROM foo GROUP BY a,b ORDER BY a,b; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- GroupAggregate (cost=0.42..61292.04 rows=50 width=14) (actual time=108.149..655.536 rows=10 loops=1) Group Key: a, b -> Index Scan using idx on foo (cost=0.42..53791.41 rows=1000000 width=14) (actual time=0.066..272.299 rows=1000000 loops=1) Planning time: 0.121 ms Execution time: 655.594 ms (5 rows)
而且,與之前的 611ms 相比,它需要更多的時間 655ms。
需要更快?
如果這還不夠快(並且 611 毫秒來分組和匯總一百萬行也不錯)。然後,
MATERIALIZED VIEW
如果您的工作量允許(如果查詢很熱和/或不經常更新),您可以使用 a,CREATE MATERIALIZED VIEW foo2 AS SELECT a,b,sum(c) FROM foo GROUP BY a,b ORDER BY a,b;
現在,當
TABLE foo2
. 然後只需REFRESH MATERIALIZED VIEW foo2;
刷新視圖即可。或者,您可以創建一個觸發器更新另一個表並使用觸發器對其進行更新。在實際聚合的列上。
有一些例外,但
sum()
不是其中之一。大多數聚合不使用索引,因為它們通常不需要索引。例外是特定於訂單的聚合(如min()
和max()
)。因此,例如,如果在我們為(a,b)
您創建索引之後執行 asum(a)
,# EXPLAIN ANALYZE SELECT sum(a) FROM foo; QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Aggregate (cost=18666.00..18666.01 rows=1 width=4) (actual time=287.063..287.063 rows=1 loops=1) -> Seq Scan on foo (cost=0.00..16166.00 rows=1000000 width=4) (actual time=0.015..85.435 rows=1000000 loops=1) Planning time: 0.098 ms Execution time: 287.104 ms (4 rows)
您可以看到它仍然使用 seq 掃描。您將看到
sum(c)
完全沒有索引的相同計劃。現在這是踢球者,# EXPLAIN ANALYZE SELECT min(a) FROM foo; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Result (cost=0.48..0.49 rows=1 width=0) (actual time=0.041..0.041 rows=1 loops=1) InitPlan 1 (returns $0) -> Limit (cost=0.42..0.48 rows=1 width=4) (actual time=0.036..0.037 rows=1 loops=1) -> Index Only Scan using idx on foo (cost=0.42..56291.41 rows=1000000 width=4) (actual time=0.035..0.035 rows=1 loops=1) Index Cond: (a IS NOT NULL) Heap Fetches: 1 Planning time: 0.171 ms Execution time: 0.080 ms (8 rows)
min(a)
不像sum(a)
可以使用排序,因此查詢計劃器意識到索引掃描,它不是免費的,有一個好處。在 (a,b,c) 上使用索引證明
無論出於何種原因,如果您想看到進一步索引對
c
求和無關緊要的證據(如果在閱讀上述內容後您仍然不明白為什麼,請提出問題),-- turn this back on we turned it off earlier SET enable_hashagg = true; DROP INDEX idx; CREATE INDEX idx ON foo (a,b,c); VACUUM FULL ANALYZE foo; EXPLAIN ANALYZE SELECT a,b,sum(c) FROM foo GROUP BY a,b ORDER BY a,b; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Sort (cost=23668.04..23668.16 rows=50 width=14) (actual time=608.888..608.889 rows=10 loops=1) Sort Key: a, b Sort Method: quicksort Memory: 25kB -> HashAggregate (cost=23666.00..23666.62 rows=50 width=14) (actual time=608.869..608.871 rows=10 loops=1) Group Key: a, b -> Seq Scan on foo (cost=0.00..16166.00 rows=1000000 width=14) (actual time=0.015..72.613 rows=1000000 loops=1) Planning time: 0.130 ms Execution time: 608.947 ms (8 rows)
一點改善都沒有。禁用
hashagg
仍然沒有任何改善。TLDR;
在這個特定和簡單的案例中,索引無關緊要。計劃者選擇最佳方法。