Postgresql

PostgreSQL 查詢性能問題

  • February 6, 2022

items當使用以下查詢(PostgreSQL 9.6)從表中獲取ID 存在於表中的行時,我們有時會得到較差的性能(約 14 秒)items_categories

SELECT items.*
FROM items 
WHERE EXISTS (
   SELECT item_id 
   FROM items_categories 
   WHERE item_id = items.id  AND category_id = 626 
) 
AND items.active = TRUE
-- possibly some others "AND" here to use more filters on "items", but not considered for this question
ORDER BY modified_at DESC 
LIMIT 10

我們架構的相關部分:

                             Table "public.items"
          Column      |       Type        |                     Modifiers
-----------------------+-------------------+----------------------------------------------------
id                    | integer           | not null default nextval('items_id_seq'::regclass)
active                | boolean           | default true
modified_at           | timestamp without time zone | default now()
Indexes:
   "items_pkey" PRIMARY KEY, btree (id)
   "active_idx" btree (active)
   "aggregate_idx" btree (id)
   "items_modified_at_idx" btree (modified_at)


 Table "public.items_categories"
  Column    |  Type   | Modifiers
-------------+---------+-----------
item_id     | integer | not null
category_id | integer | not null
Indexes:
   "unique_cat_item_assoc" UNIQUE CONSTRAINT, btree (item_id, category_id)
   "items_categories_1_idx" btree (category_id)
   "items_categories_2_idx" btree (item_id)
Foreign-key constraints:
   "items_categories_category_id_fkey" FOREIGN KEY (category_id) REFERENCES categories(id)
   "items_categories_item_id_fkey" FOREIGN KEY (item_id) REFERENCES items(id)

該表items包含 ~2 M 行,表items_categories包含 ~4 M 行

當我們要求 10行items(即LIMIT 10在上述查詢結束時)並且 10 或更多行匹配 initems_categories時,性能很好(~10ms),但是當我們要求 10items並且少於 10 行匹配 initems_categories時,查詢需要~14 秒,因為它正在對items.modified_at每 2 M 行進行一次索引掃描。

少於 10 行匹配時的查詢計劃items_categories(性能差):

Limit  (cost=0.86..11696.68 rows=10 width=1797) (actual time=168.376..14484.854 rows=7 loops=1)
 ->  Nested Loop Semi Join  (cost=0.86..2746178.23 rows=2348 width=1797) (actual time=168.376..14484.836 rows=7 loops=1)
       ->  Index Scan Backward using items_modified_at_idx on items  (cost=0.43..1680609.95 rows=2243424 width=1797) (actual time=0.054..7611.300 rows=2251395 loops=1)
             Filter: active
             Rows Removed by Filter: 2467
       ->  Index Only Scan using unique_cat_item_assoc on items_categories  (cost=0.43..0.47 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=2251395)
             Index Cond: ((item_id = items.id) AND (category_id = 626))
             Heap Fetches: 7
Planning time: 3.082 ms
Execution time: 14485.057 ms

匹配超過 10 行時的查詢計劃items_categories(性能良好):

Limit  (cost=0.86..24.07 rows=10 width=1857) (actual time=3.575..3.757 rows=10 loops=1)
 ->  Nested Loop Semi Join  (cost=0.86..2763459.56 rows=1190819 width=1857) (actual time=3.574..3.752 rows=10 loops=1)
       ->  Index Scan Backward using items_modified_at_idx on items  (cost=0.43..1684408.22 rows=2246967 width=1857) (actual time=0.013..2.205 rows=751 loops=1)
             Filter: active
       ->  Index Only Scan using unique_cat_item_assoc on items_categories  (cost=0.43..0.47 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=751)
             Index Cond: ((item_id = items.id) AND (category_id = 20))
             Heap Fetches: 10
Planning time: 1.650 ms
Execution time: 3.868 ms

我們如何調整這個查詢來處理這兩種情況?(即無論items_categories匹配多少行都表現良好)。

我有一個 POC 工作,我首先計算(單獨查詢)中匹配行的數量,items_categories然後如果數量少,我使用 CTE 處理子集items而不是所有行,但這確實是一個骯髒的臨時黑客 IMO ……如果數字很大,則 CTE 花費的時間太長,在這種情況下不使用此 CTE 會更有效(即上述查詢執行得更好)。

謝謝!

查詢規劃器無法獲得良好的行估計來充分規劃這種類型的查詢。如果你不能容忍糟糕的計劃,你將不得不做兩個查詢。您可以通過多種方式來安排它,聽起來您已經找到了一種。您可以創建一個返回集合的 plpgsql 函式,它將兩個查詢包裝到一個函式呼叫中,如果這對您來說更好的話。

你可以試一試嗎?

SELECT items.*
FROM items 
WHERE item_id in (
   SELECT item_id 
   FROM items_categories 
   where category_id = 626
   Intersect
   SELECT item_id 
   FROM items
   Where active = TRUE
) 
ORDER BY modified_at DESC 
LIMIT 10

引用自:https://dba.stackexchange.com/questions/296164