Postgresql
如何優化 postgres 中的視窗查詢
我有大約 175k 條記錄的下表:
Column | Type | Modifiers ----------------+-----------------------------+------------------------------------- id | uuid | not null default uuid_generate_v4() competition_id | uuid | not null user_id | uuid | not null first_name | character varying(255) | not null last_name | character varying(255) | not null image | character varying(255) | country | character varying(255) | slug | character varying(255) | not null total_votes | integer | not null default 0 created_at | timestamp without time zone | updated_at | timestamp without time zone | featured_until | timestamp without time zone | image_src | character varying(255) | hidden | boolean | not null default false photos_count | integer | not null default 0 photo_id | uuid | Indexes: "entries_pkey" PRIMARY KEY, btree (id) "index_entries_on_competition_id" btree (competition_id) "index_entries_on_featured_until" btree (featured_until) "index_entries_on_hidden" btree (hidden) "index_entries_on_photo_id" btree (photo_id) "index_entries_on_slug" btree (slug) "index_entries_on_total_votes" btree (total_votes) "index_entries_on_user_id" btree (user_id)
我正在執行以下查詢以獲取條目的排名以及下一個和上一個條目的 slug:
WITH entry_with_global_rank AS ( SELECT id , rank() OVER w AS global_rank , LAG(slug) OVER w AS previous_slug , LEAD(slug) OVER w AS next_slug FROM entries WHERE competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b' WINDOW w AS (PARTITION BY competition_id ORDER BY total_votes DESC) ) SELECT * FROM entry_with_global_rank WHERE id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0' LIMIT 1;
以下是結果
EXPLAIN
:QUERY PLAN ----------------------------------------------------------------------------------------------- Limit (cost=516228.88..516233.37 rows=1 width=88) CTE entry_with_global_rank -> WindowAgg (cost=510596.59..516228.88 rows=250324 width=52) -> Sort (cost=510596.59..511222.40 rows=250324 width=52) Sort Key: entries.total_votes -> Seq Scan on entries (cost=0.00..488150.74 rows=250324 width=52) Filter: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid) -> CTE Scan on entry_with_global_rank (cost=0.00..5632.29 rows=1252 width=88) Filter: (id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0'::uuid) (9 rows)
此查詢耗時約 1400 毫秒;有什麼辦法可以加快這個速度嗎?
編輯:
以下是結果
EXPLAIN ANALYZE
:QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=516228.88..516233.37 rows=1 width=88) (actual time=1232.824..1232.824 rows=1 loops=1) CTE entry_with_global_rank -> WindowAgg (cost=510596.59..516228.88 rows=250324 width=52) (actual time=1202.101..1226.846 rows=8727 loops=1) -> Sort (cost=510596.59..511222.40 rows=250324 width=52) (actual time=1202.069..1213.992 rows=8728 loops=1) Sort Key: entries.total_votes Sort Method: quicksort Memory: 8128kB -> Seq Scan on entries (cost=0.00..488150.74 rows=250324 width=52) (actual time=89.970..1174.083 rows=50335 loops=1) Filter: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid) Rows Removed by Filter: 125477 -> CTE Scan on entry_with_global_rank (cost=0.00..5632.29 rows=1252 width=88) (actual time=1232.822..1232.822 rows=1 loops=1) Filter: (id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0'::uuid) Rows Removed by Filter: 8726 Total runtime: 1234.424 ms (13 rows)
編輯2:
我
VACUUM ANALYZE
在數據庫上執行,現在查詢時間有所改善,儘管我確信必須有一些方法來提高性能:QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=475372.26..475376.76 rows=1 width=88) (actual time=138.388..138.388 rows=1 loops=1) CTE entry_with_global_rank -> WindowAgg (cost=470662.23..475372.26 rows=209335 width=35) (actual time=125.489..132.214 rows=4178 loops=1) -> Sort (cost=470662.23..471185.56 rows=209335 width=35) (actual time=125.462..126.724 rows=4179 loops=1) Sort Key: entries.total_votes Sort Method: quicksort Memory: 5510kB -> Bitmap Heap Scan on entries (cost=71390.90..452161.77 rows=209335 width=35) (actual time=29.381..87.130 rows=50390 loops=1) Recheck Cond: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid) -> Bitmap Index Scan on index_entries_on_competition_id (cost=0.00..71338.56 rows=209335 width=0) (actual time=23.593..23.593 rows=51257 loops=1) Index Cond: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid) -> CTE Scan on entry_with_global_rank (cost=0.00..4710.04 rows=1047 width=88) (actual time=138.387..138.387 rows=1 loops=1) Filter: (id = '9470ec4f-fed1-4f95-bbed-1e3dbba5f53b'::uuid) Rows Removed by Filter: 4177 Total runtime: 138.588 ms (14 rows)
編輯3:
根據要求,帶有覆蓋索引的最終查詢計劃,就在 a 之後
VACUUM ANALYZE
:QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.42..6771.99 rows=1 width=88) (actual time=46.765..46.765 rows=1 loops=1) -> Subquery Scan on entry_with_global_rank (cost=0.42..6771.99 rows=1 width=88) (actual time=46.763..46.763 rows=1 loops=1) Filter: (entry_with_global_rank.id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0'::uuid) Rows Removed by Filter: 9128 -> WindowAgg (cost=0.42..5635.06 rows=90955 width=35) (actual time=0.090..40.002 rows=9129 loops=1) -> Index Only Scan using entries_extra_special_idx on entries (cost=0.42..3815.96 rows=90955 width=35) (actual time=0.071..10.973 rows=9130 loops=1) Index Cond: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid) Heap Fetches: 166 Total runtime: 46.867 ms (9 rows)
這裡不需要CTE ,它是優化障礙。普通子查詢通常表現更好:
SELECT * FROM ( SELECT id ,rank() OVER w AS global_rank ,lag(slug) OVER w AS previous_slug ,lead(slug) OVER w AS next_slug FROM entries WHERE competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b' WINDOW w AS (ORDER BY total_votes DESC) ) entry_with_global_rank WHERE id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0' LIMIT 1;
正如@Daniel 評論的那樣,我從視窗定義中刪除了該子句,因為無論如何
PARTITION BY
您都限制為單個。competition_id
表格佈局
您可以優化表佈局以稍微減少磁碟儲存大小,這會使一切都更快,但是:
Column | Type | Modifiers ----------------+-----------------------------+------------------------------------- id | uuid | not null default uuid_generate_v4() competition_id | uuid | not null user_id | uuid | not null total_votes | integer | not null default 0 photos_count | integer | not null default 0 hidden | boolean | not null default false slug | character varying(255) | not null first_name | character varying(255) | not null last_name | character varying(255) | not null image | character varying(255) | country | character varying(255) | image_src | character varying(255) | photo_id | uuid | created_at | timestamp without time zone | updated_at | timestamp without time zone | featured_until | timestamp without time zone |
更多關於:
另外,你真的需要所有這些
uuid
列嗎?int
還是bigint
不適合你?會使表和索引更小一點,一切都更快。而且我只會
text
用於字元數據,但這無助於查詢的性能。旁白:
character varying(255)
在 Postgres 中幾乎總是毫無意義的。其他一些 RDBMS 從長度限制中受益,對於 Postgres 來說都是一樣的(除非您實際上需要強制執行不太可能的最大長度 255 個字元)。特殊指標
最後,您可以建構一個高度專業化的索引(僅當索引維護值得特殊外殼時):
CREATE INDEX entries_special_idx ON entries (competition_id, total_votes DESC, id, slug);
僅當您可以從中獲得僅索引掃描時,添加
(id, slug)
到索引才有意義。(禁用 autovacuum 或大量並發寫入會抵消這種努力。)否則刪除最後兩列。在此期間,審核您的索引。它們都在使用嗎?這裡可能有一些死貨。