優化對 690,000 行表的昂貴的 GROUP BY / ORDER BY 查詢
我正在編寫一個查詢來計算他的遊戲得分。它執行 GROUP BY 以匯總儲存在 23 個不同行(由不同“技能”分隔)中的單個玩家統計數據的 hiscore 統計數據。
我提出的以下查詢有效:
SELECT * FROM ( SELECT username, SUM(level) total_level, SUM(experience) total_experience, (SELECT rank FROM overall_hiscore_rankings WHERE username = skill_hiscore.username) rank FROM skill_hiscore GROUP BY username ORDER BY rank ) a WHERE rank >= 1 AND rank <= 25;
注意:overall_hiscore_rankings 是一個物化視圖,最後的過濾器是我對分頁的 seek 方法的實現,以防止昂貴的 LIMIT OFFSET 分頁。
在將大約 30,000 名玩家價值的數據播種到我的 hiscores 架構中之前,我沒有註意到這有任何問題。因為如上所述,每個玩家都有 23 行不同的行,所以這給我們留下了一個有690,000 行的表。
現在這給了我們非常糟糕的性能,實際上我的數據庫需要100 秒來執行這個查詢:
Sort (cost=35926899.17..35926907.50 rows=3330 width=35) (actual time=98599.592..98599.595 rows=25 loops=1) Sort Key: ((SubPlan 1)) Sort Method: quicksort Memory: 26kB -> GroupAggregate (cost=0.42..35926704.35 rows=3330 width=35) (actual time=149.023..98599.514 rows=25 loops=1) Group Key: skill_hiscore.username Filter: (((SubPlan 2) >= 1) AND ((SubPlan 3) <= 25)) Rows Removed by Filter: 29978 -> Index Scan using skill_hiscore_username on skill_hiscore (cost=0.42..46884.88 rows=690063 width=19) (actual time=144.644..293.859 rows=690063 loops=1) SubPlan 1 -> Seq Scan on overall_hiscore_rankings (cost=0.00..567.04 rows=1 width=8) (actual time=0.002..1.592 rows=1 loops=25) Filter: ((username)::text = (skill_hiscore.username)::text) Rows Removed by Filter: 30002 SubPlan 2 -> Seq Scan on overall_hiscore_rankings overall_hiscore_rankings_1 (cost=0.00..567.04 rows=1 width=8) (actual time=0.817..1.633 rows=1 loops=30003) Filter: ((username)::text = (skill_hiscore.username)::text) Rows Removed by Filter: 30002 SubPlan 3 -> Seq Scan on overall_hiscore_rankings overall_hiscore_rankings_2 (cost=0.00..567.04 rows=1 width=8) (actual time=0.816..1.633 rows=1 loops=30003) Filter: ((username)::text = (skill_hiscore.username)::text) Rows Removed by Filter: 30002 Planning Time: 2.820 ms JIT: Functions: 21 Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 1.248 ms, Inlining 44.936 ms, Optimization 60.995 ms, Emission 38.520 ms, Total 145.700 ms Execution Time: 98623.550 ms
顯然,
ORDER BY
andGROUP BY
導致了最大的性能消耗,但我不確定如何重寫它。我沒有彈出任何資訊表明此查詢需要 100 秒 - 我從未見過查詢執行得如此糟糕,坦率地說,我不知道如何優化它。
此查詢不受任何索引的支持。
我的主要問題是:這個查詢中缺少數據和數據****的索引是否足以使這個查詢的執行時間為 100 秒
GROUP BY``ORDER BY
?
相關子查詢確實會拖累性能,而且沒有必要。
也缺乏索引。
請嘗試:
SELECT username , sum(s.level) AS total_level , sum(s.experience) AS total_experience , o.rank FROM overall_hiscore_rankings o JOIN skill_hiscore s USING (username) WHERE o.rank >= 1 AND o.rank <= 25 GROUP BY username, o.rank -- ? ORDER BY o.rank;
如果
overall_hiscore_rankings.username
是唯一的,您可以添加o.rank
到GROUP BY
. 我需要表定義來確定……理想情況下,您有如下索引:
CREATE INDEX ON overall_hiscore_rankings (rank, username); CREATE INDEX ON skill_hiscore (username) INCLUDE (level, experience);
覆蓋索引只有在您的表足夠真空時才有意義。否則只做
(username)
- 您的查詢計劃中的索引skill_hiscore_username
似乎涵蓋了這一點。看:確保已
autovacuum
執行,或VACUUM ANALYZE
在創建測試案例後手動執行一次以獲取目前列統計資訊並更新可見性圖。順便說一句,您提到*“ 23個不同的行(由不同的技能分隔)”*,但
WHERE rank >= 1 AND rank <= 25
似乎選擇了其中的25個。你的意思是WHERE rank > 1 AND rank < 25
?