Postgresql
使用許多小型查找優化 PostgreSQL 查詢
我有一個關於書籍和書籍銷售的數據庫。表格如下所示:
CREATE TABLE purchase ( person_id integer, book_id integer, CONSTRAINT purchase_pk PRIMARY KEY (person_id, book_id) ); CREATE TABLE person ( person_id integer PRIMARY KEY ); CREATE TABLE book ( book_id integer PRIMARY KEY ); CREATE INDEX purchase_idx ON purchase (person_id, book_id);
我需要知道為每對書帶來兩本書的人數,所以基本上:
我為此定義了以下查詢:
CREATE MATERIALIZED VIEW book_sales_stat AS SELECT b1.book_id AS book_1_id, b2.book_id AS book_2_id, (SELECT count(*) FROM person AS per WHERE EXISTS (SELECT FROM purchase pur WHERE pur.person_id = per.person_id AND pur.book_id = b1.book_id) AND EXISTS (SELECT FROM purchase pur WHERE pur.person_id = per.person_id AND pur.book_id = b2.book_id) ) AS person_count FROM book AS b1, book AS b2 WHERE b1.book_id < b2.book_id;
但不幸的是,它非常慢。
執行計劃如下所示:
Nested Loop (cost=0.29..1564886083010.59 rows=90420300 width=16) -> Seq Scan on book b1 (cost=0.00..237.70 rows=16470 width=4) -> Index Only Scan using book_pk on book b2 (cost=0.29..96.38 rows=5490 width=4) Index Cond: (book_id > b1.book_id) SubPlan 1 -> Aggregate (cost=17306.76..17306.77 rows=1 width=8) -> Nested Loop (cost=1.14..17306.76 rows=1 width=0) -> Nested Loop (cost=0.72..17238.11 rows=123 width=8) -> Index Only Scan using purchase_idx on purchase pur_1 (cost=0.42..16791.97 rows=123 width=4) Index Cond: (book_id = b2.book_id) -> Index Only Scan using person_pk on person per (cost=0.29..3.63 rows=1 width=4) Index Cond: (person_id = pur_1.person_id) -> Index Only Scan using purchase_idx on purchase pur (cost=0.42..0.56 rows=1 width=4) Index Cond: ((person_id = per.person_id) AND (book_id = b1.book_id)) JIT: Functions: 13 Options: Inlining true, Optimization true, Expressions true, Deforming true
到目前為止看起來相當不錯,
purchase_idx
索引幾乎用於循環內的所有內容。但是為什麼它不使用並行執行計劃呢?我試圖在 Postgres 文件中查看並行執行的所有要求,但沒有發現任何我不滿意的東西。
我能做些什麼來加快查詢速度並讓它並行執行?
您的查詢屬於這種情況:
以下操作始終受到並行限制:
- 引用相關子計劃的計劃節點。
頂部嵌套循環可以並行化,但必須在分派到子計劃之前收集它。那樣做是沒有意義的。
你可以用類似的東西來達到同樣的效果
select book_1_id, book_2_id, sum(cnt) person_count from ( select b1.book_id book_1_id, b2.book_id book_2_id, 0 cnt from book b1 join book b2 on b1.book_id < b2.book_id union all select p1.book_id, p2.book_id, count(*) cnt from purchase p1 join purchase p2 on p1.person_id = p2.person_id and p1.book_id < p2.book_id group by p1.book_id, p2.book_id ) Sq group by book_1_id, book_2_id
這假設其中沒有與or
purchase
中的行不對應的行(目前沒有外鍵)。它還需要是唯一的- 現在是,但如果這只是出於展示目的,您將需要在查詢中,例如:person``book``person_id,book_id``purchase``distinct
with unique_purchase as (select distinct person_id, book_id from purchase) select book_1_id, book_2_id, sum(cnt) person_count from ( select b1.book_id book_1_id, b2.book_id book_2_id, 0 cnt from book b1 join book b2 on b1.book_id < b2.book_id union all select p1.book_id, p2.book_id, count(*) cnt from unique_purchase p1 join unique_purchase p2 on p1.person_id = p2.person_id and p1.book_id < p2.book_id group by p1.book_id, p2.book_id ) sq group by book_1_id, book_2_id
這對我來說執行得更快,但是由於所涉及的連接的性質,如果您有大量數據,它仍然不會是即時的。