如何優化 postgres 中的視窗查詢

August 22, 2014

我有大約 175k 條記錄的下表：

   Column     |            Type             |              Modifiers
----------------+-----------------------------+-------------------------------------
id             | uuid                        | not null default uuid_generate_v4()
competition_id | uuid                        | not null
user_id        | uuid                        | not null
first_name     | character varying(255)      | not null
last_name      | character varying(255)      | not null
image          | character varying(255)      |
country        | character varying(255)      |
slug           | character varying(255)      | not null
total_votes    | integer                     | not null default 0
created_at     | timestamp without time zone |
updated_at     | timestamp without time zone |
featured_until | timestamp without time zone |
image_src      | character varying(255)      |
hidden         | boolean                     | not null default false
photos_count   | integer                     | not null default 0
photo_id       | uuid                        |
Indexes:
   "entries_pkey" PRIMARY KEY, btree (id)
   "index_entries_on_competition_id" btree (competition_id)
   "index_entries_on_featured_until" btree (featured_until)
   "index_entries_on_hidden" btree (hidden)
   "index_entries_on_photo_id" btree (photo_id)
   "index_entries_on_slug" btree (slug)
   "index_entries_on_total_votes" btree (total_votes)
   "index_entries_on_user_id" btree (user_id)

我正在執行以下查詢以獲取條目的排名以及下一個和上一個條目的 slug：

WITH entry_with_global_rank AS ( 
 SELECT id
      , rank() OVER w AS global_rank
      , LAG(slug) OVER w AS previous_slug
      , LEAD(slug) OVER w AS next_slug
 FROM entries 
 WHERE competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b' 
 WINDOW w AS (PARTITION BY competition_id ORDER BY total_votes DESC) 
) 
SELECT * 
FROM entry_with_global_rank 
WHERE id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0' 
LIMIT 1;

以下是結果EXPLAIN：

                                         QUERY PLAN
-----------------------------------------------------------------------------------------------
Limit  (cost=516228.88..516233.37 rows=1 width=88)
  CTE entry_with_global_rank
    -&gt;  WindowAgg  (cost=510596.59..516228.88 rows=250324 width=52)
          -&gt;  Sort  (cost=510596.59..511222.40 rows=250324 width=52)
                Sort Key: entries.total_votes
                -&gt;  Seq Scan on entries  (cost=0.00..488150.74 rows=250324 width=52)
                      Filter: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid)
  -&gt;  CTE Scan on entry_with_global_rank  (cost=0.00..5632.29 rows=1252 width=88)
        Filter: (id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0'::uuid)
(9 rows)

此查詢耗時約 1400 毫秒；有什麼辦法可以加快這個速度嗎？

編輯：

以下是結果EXPLAIN ANALYZE：

                                                              QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=516228.88..516233.37 rows=1 width=88) (actual time=1232.824..1232.824 rows=1 loops=1)
  CTE entry_with_global_rank
    -&gt;  WindowAgg  (cost=510596.59..516228.88 rows=250324 width=52) (actual time=1202.101..1226.846 rows=8727 loops=1)
          -&gt;  Sort  (cost=510596.59..511222.40 rows=250324 width=52) (actual time=1202.069..1213.992 rows=8728 loops=1)
                Sort Key: entries.total_votes
                Sort Method: quicksort  Memory: 8128kB
                -&gt;  Seq Scan on entries  (cost=0.00..488150.74 rows=250324 width=52) (actual time=89.970..1174.083 rows=50335 loops=1)
                      Filter: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid)
                      Rows Removed by Filter: 125477
  -&gt;  CTE Scan on entry_with_global_rank  (cost=0.00..5632.29 rows=1252 width=88) (actual time=1232.822..1232.822 rows=1 loops=1)
        Filter: (id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0'::uuid)
        Rows Removed by Filter: 8726
Total runtime: 1234.424 ms
(13 rows)

編輯2：

我VACUUM ANALYZE在數據庫上執行，現在查詢時間有所改善，儘管我確信必須有一些方法來提高性能：

                                                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=475372.26..475376.76 rows=1 width=88) (actual time=138.388..138.388 rows=1 loops=1)
  CTE entry_with_global_rank
    -&gt;  WindowAgg  (cost=470662.23..475372.26 rows=209335 width=35) (actual time=125.489..132.214 rows=4178 loops=1)
          -&gt;  Sort  (cost=470662.23..471185.56 rows=209335 width=35) (actual time=125.462..126.724 rows=4179 loops=1)
                Sort Key: entries.total_votes
                Sort Method: quicksort  Memory: 5510kB
                -&gt;  Bitmap Heap Scan on entries  (cost=71390.90..452161.77 rows=209335 width=35) (actual time=29.381..87.130 rows=50390 loops=1)
                      Recheck Cond: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid)
                      -&gt;  Bitmap Index Scan on index_entries_on_competition_id  (cost=0.00..71338.56 rows=209335 width=0) (actual time=23.593..23.593 rows=51257 loops=1)
                            Index Cond: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid)
  -&gt;  CTE Scan on entry_with_global_rank  (cost=0.00..4710.04 rows=1047 width=88) (actual time=138.387..138.387 rows=1 loops=1)
        Filter: (id = '9470ec4f-fed1-4f95-bbed-1e3dbba5f53b'::uuid)
        Rows Removed by Filter: 4177
Total runtime: 138.588 ms
(14 rows)

編輯3：

根據要求，帶有覆蓋索引的最終查詢計劃，就在 a 之後VACUUM ANALYZE：

                                                                             QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.42..6771.99 rows=1 width=88) (actual time=46.765..46.765 rows=1 loops=1)
  -&gt;  Subquery Scan on entry_with_global_rank  (cost=0.42..6771.99 rows=1 width=88) (actual time=46.763..46.763 rows=1 loops=1)
        Filter: (entry_with_global_rank.id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0'::uuid)
        Rows Removed by Filter: 9128
        -&gt;  WindowAgg  (cost=0.42..5635.06 rows=90955 width=35) (actual time=0.090..40.002 rows=9129 loops=1)
              -&gt;  Index Only Scan using entries_extra_special_idx on entries  (cost=0.42..3815.96 rows=90955 width=35) (actual time=0.071..10.973 rows=9130 loops=1)
                    Index Cond: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid)
                    Heap Fetches: 166
Total runtime: 46.867 ms
(9 rows)

這裡不需要CTE ，它是優化障礙。普通子查詢通常表現更好：
SELECT * 
FROM  (
  SELECT id
        ,rank()     OVER w AS global_rank
        ,lag(slug)  OVER w AS previous_slug
        ,lead(slug) OVER w AS next_slug 
  FROM   entries 
  WHERE  competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b' 
  WINDOW w AS (ORDER BY total_votes DESC) 
  ) entry_with_global_rank 
WHERE  id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0' 
LIMIT  1;
正如@Daniel 評論的那樣，我從視窗定義中刪除了該子句，因為無論如何PARTITION BY您都限制為單個。competition_id
表格佈局
您可以優化表佈局以稍微減少磁碟儲存大小，這會使一切都更快，但是：
    Column     |            Type             |              Modifiers
----------------+-----------------------------+-------------------------------------
id             | uuid                        | not null default uuid_generate_v4()
competition_id | uuid                        | not null
user_id        | uuid                        | not null
total_votes    | integer                     | not null default 0
photos_count   | integer                     | not null default 0
hidden         | boolean                     | not null default false
slug           | character varying(255)      | not null
first_name     | character varying(255)      | not null
last_name      | character varying(255)      | not null
image          | character varying(255)      |
country        | character varying(255)      |
image_src      | character varying(255)      |
photo_id       | uuid                        |
created_at     | timestamp without time zone |
updated_at     | timestamp without time zone |
featured_until | timestamp without time zone |
更多關於：
為讀取性能配置 PostgreSQL
另外，你真的需要所有這些uuid列嗎？int還是bigint不適合你？會使表和索引更小一點，一切都更快。
而且我只會text用於字元數據，但這無助於查詢的性能。
旁白：character varying(255)在 Postgres 中幾乎總是毫無意義的。其他一些 RDBMS 從長度限制中受益，對於 Postgres 來說都是一樣的（除非您實際上需要強制執行不太可能的最大長度 255 個字元）。
特殊指標
最後，您可以建構一個高度專業化的索引（僅當索引維護值得特殊外殼時）：
CREATE INDEX entries_special_idx ON entries (competition_id, total_votes DESC, id, slug);
僅當您可以從中獲得僅索引掃描時，添加(id, slug)到索引才有意義。（禁用 autovacuum 或大量並發寫入會抵消這種努力。）否則刪除最後兩列。
在此期間，審核您的索引。它們都在使用嗎？這裡可能有一些死貨。

引用自：https://dba.stackexchange.com/questions/74530

如何優化 postgres 中的視窗查詢

表格佈局

特殊指標

相關問答

使用 UPDATE 對錶行進行排序並保存行號

具有最新時間戳的行

索引：如果節點數相同，則整數與字元串的性能

Postgres 有時使用劣質索引來表示 WHERE a IN (…) ORDER BY b LIMIT N

如何索引/優化 PostgreSQL 中兩列的總和？

有沒有辦法加快 JOIN ON LIKE 的速度？