Postgresql

如何優化 postgres 中的視窗查詢

  • August 22, 2014

我有大約 175k 條記錄的下表:

   Column     |            Type             |              Modifiers
----------------+-----------------------------+-------------------------------------
id             | uuid                        | not null default uuid_generate_v4()
competition_id | uuid                        | not null
user_id        | uuid                        | not null
first_name     | character varying(255)      | not null
last_name      | character varying(255)      | not null
image          | character varying(255)      |
country        | character varying(255)      |
slug           | character varying(255)      | not null
total_votes    | integer                     | not null default 0
created_at     | timestamp without time zone |
updated_at     | timestamp without time zone |
featured_until | timestamp without time zone |
image_src      | character varying(255)      |
hidden         | boolean                     | not null default false
photos_count   | integer                     | not null default 0
photo_id       | uuid                        |
Indexes:
   "entries_pkey" PRIMARY KEY, btree (id)
   "index_entries_on_competition_id" btree (competition_id)
   "index_entries_on_featured_until" btree (featured_until)
   "index_entries_on_hidden" btree (hidden)
   "index_entries_on_photo_id" btree (photo_id)
   "index_entries_on_slug" btree (slug)
   "index_entries_on_total_votes" btree (total_votes)
   "index_entries_on_user_id" btree (user_id)

我正在執行以下查詢以獲取條目的排名以及下一個和上一個條目的 slug:

WITH entry_with_global_rank AS ( 
 SELECT id
      , rank() OVER w AS global_rank
      , LAG(slug) OVER w AS previous_slug
      , LEAD(slug) OVER w AS next_slug
 FROM entries 
 WHERE competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b' 
 WINDOW w AS (PARTITION BY competition_id ORDER BY total_votes DESC) 
) 
SELECT * 
FROM entry_with_global_rank 
WHERE id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0' 
LIMIT 1;

以下是結果EXPLAIN

                                         QUERY PLAN
-----------------------------------------------------------------------------------------------
Limit  (cost=516228.88..516233.37 rows=1 width=88)
  CTE entry_with_global_rank
    ->  WindowAgg  (cost=510596.59..516228.88 rows=250324 width=52)
          ->  Sort  (cost=510596.59..511222.40 rows=250324 width=52)
                Sort Key: entries.total_votes
                ->  Seq Scan on entries  (cost=0.00..488150.74 rows=250324 width=52)
                      Filter: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid)
  ->  CTE Scan on entry_with_global_rank  (cost=0.00..5632.29 rows=1252 width=88)
        Filter: (id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0'::uuid)
(9 rows)

此查詢耗時約 1400 毫秒;有什麼辦法可以加快這個速度嗎?

編輯:

以下是結果EXPLAIN ANALYZE

                                                              QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=516228.88..516233.37 rows=1 width=88) (actual time=1232.824..1232.824 rows=1 loops=1)
  CTE entry_with_global_rank
    ->  WindowAgg  (cost=510596.59..516228.88 rows=250324 width=52) (actual time=1202.101..1226.846 rows=8727 loops=1)
          ->  Sort  (cost=510596.59..511222.40 rows=250324 width=52) (actual time=1202.069..1213.992 rows=8728 loops=1)
                Sort Key: entries.total_votes
                Sort Method: quicksort  Memory: 8128kB
                ->  Seq Scan on entries  (cost=0.00..488150.74 rows=250324 width=52) (actual time=89.970..1174.083 rows=50335 loops=1)
                      Filter: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid)
                      Rows Removed by Filter: 125477
  ->  CTE Scan on entry_with_global_rank  (cost=0.00..5632.29 rows=1252 width=88) (actual time=1232.822..1232.822 rows=1 loops=1)
        Filter: (id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0'::uuid)
        Rows Removed by Filter: 8726
Total runtime: 1234.424 ms
(13 rows)

編輯2:

VACUUM ANALYZE在數據庫上執行,現在查詢時間有所改善,儘管我確信必須有一些方法來提高性能:

                                                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=475372.26..475376.76 rows=1 width=88) (actual time=138.388..138.388 rows=1 loops=1)
  CTE entry_with_global_rank
    ->  WindowAgg  (cost=470662.23..475372.26 rows=209335 width=35) (actual time=125.489..132.214 rows=4178 loops=1)
          ->  Sort  (cost=470662.23..471185.56 rows=209335 width=35) (actual time=125.462..126.724 rows=4179 loops=1)
                Sort Key: entries.total_votes
                Sort Method: quicksort  Memory: 5510kB
                ->  Bitmap Heap Scan on entries  (cost=71390.90..452161.77 rows=209335 width=35) (actual time=29.381..87.130 rows=50390 loops=1)
                      Recheck Cond: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid)
                      ->  Bitmap Index Scan on index_entries_on_competition_id  (cost=0.00..71338.56 rows=209335 width=0) (actual time=23.593..23.593 rows=51257 loops=1)
                            Index Cond: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid)
  ->  CTE Scan on entry_with_global_rank  (cost=0.00..4710.04 rows=1047 width=88) (actual time=138.387..138.387 rows=1 loops=1)
        Filter: (id = '9470ec4f-fed1-4f95-bbed-1e3dbba5f53b'::uuid)
        Rows Removed by Filter: 4177
Total runtime: 138.588 ms
(14 rows)

編輯3:

根據要求,帶有覆蓋索引的最終查詢計劃,就在 a 之後VACUUM ANALYZE

                                                                             QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.42..6771.99 rows=1 width=88) (actual time=46.765..46.765 rows=1 loops=1)
  ->  Subquery Scan on entry_with_global_rank  (cost=0.42..6771.99 rows=1 width=88) (actual time=46.763..46.763 rows=1 loops=1)
        Filter: (entry_with_global_rank.id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0'::uuid)
        Rows Removed by Filter: 9128
        ->  WindowAgg  (cost=0.42..5635.06 rows=90955 width=35) (actual time=0.090..40.002 rows=9129 loops=1)
              ->  Index Only Scan using entries_extra_special_idx on entries  (cost=0.42..3815.96 rows=90955 width=35) (actual time=0.071..10.973 rows=9130 loops=1)
                    Index Cond: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid)
                    Heap Fetches: 166
Total runtime: 46.867 ms
(9 rows)

這裡不需要CTE ,它是優化障礙。普通子查詢通常表現更好:

SELECT * 
FROM  (
  SELECT id
        ,rank()     OVER w AS global_rank
        ,lag(slug)  OVER w AS previous_slug
        ,lead(slug) OVER w AS next_slug 
  FROM   entries 
  WHERE  competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b' 
  WINDOW w AS (ORDER BY total_votes DESC) 
  ) entry_with_global_rank 
WHERE  id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0' 
LIMIT  1;

正如@Daniel 評論的那樣,我從視窗定義中刪除了該子句,因為無論如何PARTITION BY您都限制為單個。competition_id

表格佈局

您可以優化表佈局以稍微減少磁碟儲存大小,這會使一切都更快,但是:

    Column     |            Type             |              Modifiers
----------------+-----------------------------+-------------------------------------
id             | uuid                        | not null default uuid_generate_v4()
competition_id | uuid                        | not null
user_id        | uuid                        | not null
total_votes    | integer                     | not null default 0
photos_count   | integer                     | not null default 0
hidden         | boolean                     | not null default false
slug           | character varying(255)      | not null
first_name     | character varying(255)      | not null
last_name      | character varying(255)      | not null
image          | character varying(255)      |
country        | character varying(255)      |
image_src      | character varying(255)      |
photo_id       | uuid                        |
created_at     | timestamp without time zone |
updated_at     | timestamp without time zone |
featured_until | timestamp without time zone |

更多關於:

另外,你真的需要所有這些uuid列嗎?int還是bigint不適合你?會使表和索引更小一點,一切都更快。

而且我只會text用於字元數據,但這無助於查詢的性能。

旁白:character varying(255)在 Postgres 中幾乎總是毫無意義的。其他一些 RDBMS 從長度限制中受益,對於 Postgres 來說都是一樣的(除非您實際上需要強制執行不太可能的最大長度 255 個字元)。

特殊指標

最後,您可以建構一個高度專業化的索引(僅當索引維護值得特殊外殼時):

CREATE INDEX entries_special_idx ON entries (competition_id, total_votes DESC, id, slug);

僅當您可以從中獲得僅索引掃描時,添加(id, slug)到索引才有意義。(禁用 autovacuum 或大量並發寫入會抵消這種努力。)否則刪除最後兩列。

在此期間,審核您的索引。它們都在使用嗎?這裡可能有一些死貨。

引用自:https://dba.stackexchange.com/questions/74530