Postgresql

使用許多小型查找優化 PostgreSQL 查詢

  • May 6, 2022

我有一個關於書籍和書籍銷售的數據庫。表格如下所示:

CREATE TABLE purchase (
   person_id integer,
   book_id integer,
   CONSTRAINT purchase_pk PRIMARY KEY (person_id, book_id)
);

CREATE TABLE person (
   person_id integer PRIMARY KEY
);

CREATE TABLE book (
   book_id integer PRIMARY KEY
);

CREATE INDEX purchase_idx ON purchase (person_id, book_id);

我需要知道為每對書帶來兩本書的人數,所以基本上:

我為此定義了以下查詢:

CREATE MATERIALIZED VIEW book_sales_stat AS
SELECT b1.book_id AS book_1_id,
      b2.book_id AS book_2_id,
      (SELECT count(*)
       FROM person AS per
       WHERE EXISTS (SELECT FROM purchase pur
                     WHERE pur.person_id = per.person_id
                     AND pur.book_id = b1.book_id)
       AND EXISTS (SELECT FROM purchase pur
                     WHERE pur.person_id = per.person_id
                     AND pur.book_id = b2.book_id)
      ) AS person_count
FROM book AS b1, book AS b2
WHERE b1.book_id < b2.book_id;

但不幸的是,它非常慢。

執行計劃如下所示:

Nested Loop  (cost=0.29..1564886083010.59 rows=90420300 width=16)
  ->  Seq Scan on book b1  (cost=0.00..237.70 rows=16470 width=4)
  ->  Index Only Scan using book_pk on book b2  (cost=0.29..96.38 rows=5490 width=4)
        Index Cond: (book_id > b1.book_id)
  SubPlan 1
    ->  Aggregate  (cost=17306.76..17306.77 rows=1 width=8)
          ->  Nested Loop  (cost=1.14..17306.76 rows=1 width=0)
                ->  Nested Loop  (cost=0.72..17238.11 rows=123 width=8)
                      ->  Index Only Scan using purchase_idx on purchase pur_1  (cost=0.42..16791.97 rows=123 width=4)
                            Index Cond: (book_id = b2.book_id)
                      ->  Index Only Scan using person_pk on person per  (cost=0.29..3.63 rows=1 width=4)
                            Index Cond: (person_id = pur_1.person_id)
                ->  Index Only Scan using purchase_idx on purchase pur  (cost=0.42..0.56 rows=1 width=4)
                      Index Cond: ((person_id = per.person_id) AND (book_id = b1.book_id))
JIT:
  Functions: 13
  Options: Inlining true, Optimization true, Expressions true, Deforming true

到目前為止看起來相當不錯,purchase_idx索引幾乎用於循環內的所有內容。

但是為什麼它不使用並行執行計劃呢?我試圖在 Postgres 文件中查看並行執行的所有要求,但沒有發現任何我不滿意的東西。

我能做些什麼來加快查詢速度並讓它並行執行?

您的查詢屬於這種情況

以下操作始終受到並行限制:

  • 引用相關子計劃的計劃節點。

頂部嵌套循環可以並行化,但必須在分派到子計劃之前收集它。那樣做是沒有意義的。

你可以用類似的東西來達到同樣的效果

select  book_1_id, book_2_id, sum(cnt)  person_count
from (
select  b1.book_id book_1_id, b2.book_id book_2_id, 0 cnt
from    book b1
join    book b2
 on    b1.book_id < b2.book_id
union all
select  p1.book_id, p2.book_id, count(*) cnt
from    purchase p1
join    purchase p2
 on    p1.person_id = p2.person_id
and    p1.book_id < p2.book_id
group by  p1.book_id, p2.book_id
) Sq
group by book_1_id, book_2_id

這假設其中沒有與orpurchase中的行不對應的行(目前沒有外鍵)。它還需要是唯一的- 現在是,但如果這只是出於展示目的,您將需要在查詢中,例如:person``book``person_id,book_id``purchase``distinct

with unique_purchase as (select distinct person_id, book_id from purchase)
select  book_1_id, book_2_id, sum(cnt)  person_count
from (
select  b1.book_id book_1_id, b2.book_id book_2_id, 0 cnt
from    book b1
join    book b2
 on    b1.book_id < b2.book_id
union all
select  p1.book_id, p2.book_id, count(*) cnt
from    unique_purchase  p1
join    unique_purchase  p2
 on    p1.person_id = p2.person_id
and    p1.book_id < p2.book_id
group by  p1.book_id, p2.book_id
) sq
group by book_1_id, book_2_id

這對我來說執行得更快,但是由於所涉及的連接的性質,如果您有大量數據,它仍然不會是即時的。

引用自:https://dba.stackexchange.com/questions/311726