Postgresql

使用 WINDOW 加速 SELECT,使用複合索引加速 (PARTITION … ORDER)

  • May 12, 2015

我有下表:

CREATE TABLE ranking_points  (
 id serial NOT NULL,
 store_id integer,
 book_id integer,
 list_name character varying(255),
 rank integer,
 created_at timestamp without time zone,
 updated_at timestamp without time zone,
 storewide boolean,
 yesterday_delta integer,
 best_ever_rank integer,
 best_ever_date date,
 CONSTRAINT ranking_points_pkey PRIMARY KEY (id)
);

CREATE INDEX index_ranking_points_on_book_id_and_created_at
 ON ranking_points (book_id, created_at);

CREATE INDEX index_ranking_points_on_book_id_list_name_and_created_at_desc
 ON ranking_points (book_id, list_name, created_at DESC);

CREATE INDEX index_ranking_points_on_store_id_and_created_at
 ON ranking_points (store_id, created_at);`

我正在執行以下SELECT語句:

select distinct book_id, list_name, 
               first_value(created_at) over w as created_at, 
               first_value(rank) over w as rank, 
               first_value(id) over w as id
from ranking_points where book_id in (61,62,63)
window w as (partition by book_id, list_name order by 3 desc)

我希望 Postgres 使用index_ranking_points_on_book_id_list_name_and_created_at_desc索引,因為它與 and 完美匹配partition byorder by但它使用index_ranking_points_on_book_id_and_created_at索引。這是explain analyze

HashAggregate  (cost=895.70..901.41 rows=571 width=42) (actual time=0.740..0.740 rows=0 loops=1)
 ->  WindowAgg  (cost=874.28..888.56 rows=571 width=42) (actual time=0.737..0.737 rows=0 loops=1)
       ->  Sort  (cost=874.28..875.71 rows=571 width=42) (actual time=0.737..0.737 rows=0 loops=1)
             Sort Key: book_id, list_name
             Sort Method: quicksort  Memory: 25kB
             ->  Bitmap Heap Scan on ranking_points  (cost=17.25..848.14 rows=571 width=42) (actual time=0.728..0.728 rows=0 loops=1)
                   Recheck Cond: (book_id = ANY ('{61,62,63}'::integer[]))
                   ->  Bitmap Index Scan on index_ranking_points_on_book_id_and_created_at  (cost=0.00..17.11 rows=571 width=0) (actual time=0.726..0.726 rows=0 loops=1)
                         Index Cond: (book_id = ANY ('{61,62,63}'::integer[]))
Total runtime: 0.811 ms

您可以看到我正在嘗試建構一個由分組值和first_value多個欄位組成的結果集。我希望 Postgres 可以只使用索引來完成所有選擇,然後從每個組中讀取一條記錄以獲取first_value值。這是可以實現的嗎?我有其他解決方案,但它們會生成表的順序掃描,所以這不是很好。

答案是 Postgres 確實使用索引,如果你有足夠的記錄。這違反了我的開發數據庫,該數據庫相對空。在沒有索引的生產中,我從 EXPLAIN ANALYZE 得到這個:

HashAggregate  (cost=186849.60..187449.78 rows=200061 width=48) (actual time=129567.598..129568.987 rows=1903 loops=1)
 ->  WindowAgg  (cost=185149.08..186349.45 rows=200061 width=48) (actual time=129182.770..129460.868 rows=186023 loops=1)
       ->  Sort  (cost=185149.08..185249.11 rows=200061 width=48) (actual time=129181.497..129228.725 rows=186023 loops=1)
             Sort Key: book_id, list_name
             Sort Method: quicksort  Memory: 25441kB
             ->  Bitmap Heap Scan on ranking_points  (cost=3813.71..181625.99 rows=200061 width=48) (actual time=67.821..128210.717 rows=186023 loops=1)
                   Recheck Cond: (book_id = ANY ('{61,62,63,64,66,70,78,270,301,298,398,402,414,15485,15416,2767,6922,6920,6974,18002,9122,15444,65,1774,4939,18331,22828,22209,19841,22402,22841,18232}'::integer[]))
                   ->  Bitmap Index Scan on index_ranking_points_on_book_id_and_created_at  (cost=0.00..3803.70 rows=200061 width=0) (actual time=55.568..55.568 rows=186023 loops=1)
                         Index Cond: (book_id = ANY ('{61,62,63,64,66,70,78,270,301,298,398,402,414,15485,15416,2767,6922,6920,6974,18002,9122,15444,65,1774,4939,18331,22828,22209,19841,22402,22841,18232}'::integer[]))
Total runtime: 129573.021 ms

有了索引,我得到了這個:

HashAggregate  (cost=193464.14..194108.37 rows=214744 width=48) (actual time=1446.685..1447.571 rows=1903 loops=1)
 ->  WindowAgg  (cost=191638.82..192927.28 rows=214744 width=48) (actual time=1091.394..1349.506 rows=186027 loops=1)
       ->  Sort  (cost=191638.82..191746.19 rows=214744 width=48) (actual time=1091.381..1136.297 rows=186027 loops=1)
             Sort Key: book_id, list_name
             Sort Method: quicksort  Memory: 25441kB
             ->  Bitmap Heap Scan on ranking_points  (cost=3772.47..187835.22 rows=214744 width=48) (actual time=58.064..221.151 rows=186027 loops=1)
                   Recheck Cond: (book_id = ANY ('{61,62,63,64,66,70,78,270,301,298,398,402,414,15485,15416,2767,6922,6920,6974,18002,9122,15444,65,1774,4939,18331,22828,22209,19841,22402,22841,18232}'::integer[]))
                   ->  Bitmap Index Scan on index_ranking_points_on_book_id_list_name_and_created_at_desc  (cost=0.00..3761.74 rows=214744 width=0) (actual time=44.349..44.349 rows=186027 loops=1)
                         Index Cond: (book_id = ANY ('{61,62,63,64,66,70,78,270,301,298,398,402,414,15485,15416,2767,6922,6920,6974,18002,9122,15444,65,1774,4939,18331,22828,22209,19841,22402,22841,18232}'::integer[]))
Total runtime: 1450.444 ms

顯然,您的表中有許多行,每個組合的每個組合都具有相同的list_namebook_idranking_points,這是非常低效的。

book_id              ... 32     
(book_id, list_name) ... 1903
total rows           ... 186023 

您定義了 column list_name character varying(255),這表明它可能是一個長字元串。(順便說一句,255 的限制在 Postgres 中沒有性能優勢。)如果您以中途標準化形式儲存它,您將獲得更小的儲存空間和更快的結果:

架構

CREATE TABLE book (
 book_id serial PRIMARY KEY
, book    text NOT NULL
-- you probably have that ...
);

CREATE TABLE listed (
 listed_id serial PRIMARY KEY
, book_id   int REFERENCES book
, list_name text NOT NULL
, **UNIQUE (book_id, list_name)**  -- provides needed index
--  potentially more?  
);

CREATE TABLE ranking_points (
 ranking_points_id serial PRIMARY KEY
, store_id   integer
, **listed_id integer NOT NULL REFERENCES listed**
, rank       integer
, created_at timestamp
, updated_at timestamp
, storewide  boolean
, yesterday_delta integer
, best_ever_rank  integer
, best_ever_date  date
);

CREATE INDEX index_ranking_points_listed_created_at
ON ranking_points (listed_id, created_at DESC NULLS LAST);

如果created_at可以為 NULL 就像沒有NOT NULL約束所暗示的那樣,請DESC NULLS LAST用作排序順序(這不會傷害任何一種方式)

如果您可以從中獲得僅索引掃描,那麼將其他無用的列添加到索引rank中是值得id的。

詢問

SELECT l.book_id, l.list_name 
    , r.created_at, r.rank, r.id
FROM  unnest('{61,62,63}'::int[]) b(book_id)
JOIN  listed l USING (book_id)
   , LATERAL (
  SELECT created_at, rank, id
  FROM   ranking_points
  WHERE  listed_id = l.listed_id
  ORDER  BY created_at DESC NULLS LAST  -- match index
  LIMIT  1
  ) r;

沒有測試數據的SQL Fiddle 。

應該快得多。詳細解釋:

引用自:https://dba.stackexchange.com/questions/101079