使用 WINDOW 加速 SELECT,使用複合索引加速 (PARTITION … ORDER)
我有下表:
CREATE TABLE ranking_points ( id serial NOT NULL, store_id integer, book_id integer, list_name character varying(255), rank integer, created_at timestamp without time zone, updated_at timestamp without time zone, storewide boolean, yesterday_delta integer, best_ever_rank integer, best_ever_date date, CONSTRAINT ranking_points_pkey PRIMARY KEY (id) ); CREATE INDEX index_ranking_points_on_book_id_and_created_at ON ranking_points (book_id, created_at); CREATE INDEX index_ranking_points_on_book_id_list_name_and_created_at_desc ON ranking_points (book_id, list_name, created_at DESC); CREATE INDEX index_ranking_points_on_store_id_and_created_at ON ranking_points (store_id, created_at);`
我正在執行以下
SELECT
語句:select distinct book_id, list_name, first_value(created_at) over w as created_at, first_value(rank) over w as rank, first_value(id) over w as id from ranking_points where book_id in (61,62,63) window w as (partition by book_id, list_name order by 3 desc)
我希望 Postgres 使用
index_ranking_points_on_book_id_list_name_and_created_at_desc
索引,因為它與 and 完美匹配partition by
,order by
但它使用index_ranking_points_on_book_id_and_created_at
索引。這是explain analyze
:HashAggregate (cost=895.70..901.41 rows=571 width=42) (actual time=0.740..0.740 rows=0 loops=1) -> WindowAgg (cost=874.28..888.56 rows=571 width=42) (actual time=0.737..0.737 rows=0 loops=1) -> Sort (cost=874.28..875.71 rows=571 width=42) (actual time=0.737..0.737 rows=0 loops=1) Sort Key: book_id, list_name Sort Method: quicksort Memory: 25kB -> Bitmap Heap Scan on ranking_points (cost=17.25..848.14 rows=571 width=42) (actual time=0.728..0.728 rows=0 loops=1) Recheck Cond: (book_id = ANY ('{61,62,63}'::integer[])) -> Bitmap Index Scan on index_ranking_points_on_book_id_and_created_at (cost=0.00..17.11 rows=571 width=0) (actual time=0.726..0.726 rows=0 loops=1) Index Cond: (book_id = ANY ('{61,62,63}'::integer[])) Total runtime: 0.811 ms
您可以看到我正在嘗試建構一個由分組值和
first_value
多個欄位組成的結果集。我希望 Postgres 可以只使用索引來完成所有選擇,然後從每個組中讀取一條記錄以獲取first_value
值。這是可以實現的嗎?我有其他解決方案,但它們會生成表的順序掃描,所以這不是很好。
答案是 Postgres 確實使用索引,如果你有足夠的記錄。這違反了我的開發數據庫,該數據庫相對空。在沒有索引的生產中,我從 EXPLAIN ANALYZE 得到這個:
HashAggregate (cost=186849.60..187449.78 rows=200061 width=48) (actual time=129567.598..129568.987 rows=1903 loops=1) -> WindowAgg (cost=185149.08..186349.45 rows=200061 width=48) (actual time=129182.770..129460.868 rows=186023 loops=1) -> Sort (cost=185149.08..185249.11 rows=200061 width=48) (actual time=129181.497..129228.725 rows=186023 loops=1) Sort Key: book_id, list_name Sort Method: quicksort Memory: 25441kB -> Bitmap Heap Scan on ranking_points (cost=3813.71..181625.99 rows=200061 width=48) (actual time=67.821..128210.717 rows=186023 loops=1) Recheck Cond: (book_id = ANY ('{61,62,63,64,66,70,78,270,301,298,398,402,414,15485,15416,2767,6922,6920,6974,18002,9122,15444,65,1774,4939,18331,22828,22209,19841,22402,22841,18232}'::integer[])) -> Bitmap Index Scan on index_ranking_points_on_book_id_and_created_at (cost=0.00..3803.70 rows=200061 width=0) (actual time=55.568..55.568 rows=186023 loops=1) Index Cond: (book_id = ANY ('{61,62,63,64,66,70,78,270,301,298,398,402,414,15485,15416,2767,6922,6920,6974,18002,9122,15444,65,1774,4939,18331,22828,22209,19841,22402,22841,18232}'::integer[])) Total runtime: 129573.021 ms
有了索引,我得到了這個:
HashAggregate (cost=193464.14..194108.37 rows=214744 width=48) (actual time=1446.685..1447.571 rows=1903 loops=1) -> WindowAgg (cost=191638.82..192927.28 rows=214744 width=48) (actual time=1091.394..1349.506 rows=186027 loops=1) -> Sort (cost=191638.82..191746.19 rows=214744 width=48) (actual time=1091.381..1136.297 rows=186027 loops=1) Sort Key: book_id, list_name Sort Method: quicksort Memory: 25441kB -> Bitmap Heap Scan on ranking_points (cost=3772.47..187835.22 rows=214744 width=48) (actual time=58.064..221.151 rows=186027 loops=1) Recheck Cond: (book_id = ANY ('{61,62,63,64,66,70,78,270,301,298,398,402,414,15485,15416,2767,6922,6920,6974,18002,9122,15444,65,1774,4939,18331,22828,22209,19841,22402,22841,18232}'::integer[])) -> Bitmap Index Scan on index_ranking_points_on_book_id_list_name_and_created_at_desc (cost=0.00..3761.74 rows=214744 width=0) (actual time=44.349..44.349 rows=186027 loops=1) Index Cond: (book_id = ANY ('{61,62,63,64,66,70,78,270,301,298,398,402,414,15485,15416,2767,6922,6920,6974,18002,9122,15444,65,1774,4939,18331,22828,22209,19841,22402,22841,18232}'::integer[])) Total runtime: 1450.444 ms
顯然,您的表中有許多行,每個組合的每個組合都具有相同的
list_name
行book_id
數ranking_points
,這是非常低效的。book_id ... 32 (book_id, list_name) ... 1903 total rows ... 186023
您定義了 column
list_name character varying(255)
,這表明它可能是一個長字元串。(順便說一句,255 的限制在 Postgres 中沒有性能優勢。)如果您以中途標準化形式儲存它,您將獲得更小的儲存空間和更快的結果:架構
CREATE TABLE book ( book_id serial PRIMARY KEY , book text NOT NULL -- you probably have that ... ); CREATE TABLE listed ( listed_id serial PRIMARY KEY , book_id int REFERENCES book , list_name text NOT NULL , **UNIQUE (book_id, list_name)** -- provides needed index -- potentially more? ); CREATE TABLE ranking_points ( ranking_points_id serial PRIMARY KEY , store_id integer , **listed_id integer NOT NULL REFERENCES listed** , rank integer , created_at timestamp , updated_at timestamp , storewide boolean , yesterday_delta integer , best_ever_rank integer , best_ever_date date ); CREATE INDEX index_ranking_points_listed_created_at ON ranking_points (listed_id, created_at DESC NULLS LAST);
如果
created_at
可以為 NULL 就像沒有NOT NULL
約束所暗示的那樣,請DESC NULLS LAST
用作排序順序(這不會傷害任何一種方式)如果您可以從中獲得僅索引掃描,那麼將其他無用的列添加到索引
rank
中是值得id
的。詢問
SELECT l.book_id, l.list_name , r.created_at, r.rank, r.id FROM unnest('{61,62,63}'::int[]) b(book_id) JOIN listed l USING (book_id) , LATERAL ( SELECT created_at, rank, id FROM ranking_points WHERE listed_id = l.listed_id ORDER BY created_at DESC NULLS LAST -- match index LIMIT 1 ) r;
沒有測試數據的SQL Fiddle 。
應該快得多。詳細解釋: