Postgresql
PostgreSQL 查詢性能問題
items
當使用以下查詢(PostgreSQL 9.6)從表中獲取ID 存在於表中的行時,我們有時會得到較差的性能(約 14 秒)items_categories
:SELECT items.* FROM items WHERE EXISTS ( SELECT item_id FROM items_categories WHERE item_id = items.id AND category_id = 626 ) AND items.active = TRUE -- possibly some others "AND" here to use more filters on "items", but not considered for this question ORDER BY modified_at DESC LIMIT 10
我們架構的相關部分:
Table "public.items" Column | Type | Modifiers -----------------------+-------------------+---------------------------------------------------- id | integer | not null default nextval('items_id_seq'::regclass) active | boolean | default true modified_at | timestamp without time zone | default now() Indexes: "items_pkey" PRIMARY KEY, btree (id) "active_idx" btree (active) "aggregate_idx" btree (id) "items_modified_at_idx" btree (modified_at) Table "public.items_categories" Column | Type | Modifiers -------------+---------+----------- item_id | integer | not null category_id | integer | not null Indexes: "unique_cat_item_assoc" UNIQUE CONSTRAINT, btree (item_id, category_id) "items_categories_1_idx" btree (category_id) "items_categories_2_idx" btree (item_id) Foreign-key constraints: "items_categories_category_id_fkey" FOREIGN KEY (category_id) REFERENCES categories(id) "items_categories_item_id_fkey" FOREIGN KEY (item_id) REFERENCES items(id)
該表
items
包含 ~2 M 行,表items_categories
包含 ~4 M 行當我們要求 10行
items
(即LIMIT 10
在上述查詢結束時)並且 10 或更多行匹配 initems_categories
時,性能很好(~10ms),但是當我們要求 10items
並且少於 10 行匹配 initems_categories
時,查詢需要~14 秒,因為它正在對items.modified_at
每 2 M 行進行一次索引掃描。少於 10 行匹配時的查詢計劃
items_categories
(性能差):Limit (cost=0.86..11696.68 rows=10 width=1797) (actual time=168.376..14484.854 rows=7 loops=1) -> Nested Loop Semi Join (cost=0.86..2746178.23 rows=2348 width=1797) (actual time=168.376..14484.836 rows=7 loops=1) -> Index Scan Backward using items_modified_at_idx on items (cost=0.43..1680609.95 rows=2243424 width=1797) (actual time=0.054..7611.300 rows=2251395 loops=1) Filter: active Rows Removed by Filter: 2467 -> Index Only Scan using unique_cat_item_assoc on items_categories (cost=0.43..0.47 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=2251395) Index Cond: ((item_id = items.id) AND (category_id = 626)) Heap Fetches: 7 Planning time: 3.082 ms Execution time: 14485.057 ms
匹配超過 10 行時的查詢計劃
items_categories
(性能良好):Limit (cost=0.86..24.07 rows=10 width=1857) (actual time=3.575..3.757 rows=10 loops=1) -> Nested Loop Semi Join (cost=0.86..2763459.56 rows=1190819 width=1857) (actual time=3.574..3.752 rows=10 loops=1) -> Index Scan Backward using items_modified_at_idx on items (cost=0.43..1684408.22 rows=2246967 width=1857) (actual time=0.013..2.205 rows=751 loops=1) Filter: active -> Index Only Scan using unique_cat_item_assoc on items_categories (cost=0.43..0.47 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=751) Index Cond: ((item_id = items.id) AND (category_id = 20)) Heap Fetches: 10 Planning time: 1.650 ms Execution time: 3.868 ms
我們如何調整這個查詢來處理這兩種情況?(即無論
items_categories
匹配多少行都表現良好)。我有一個 POC 工作,我首先計算(單獨查詢)中匹配行的數量,
items_categories
然後如果數量少,我使用 CTE 處理子集items
而不是所有行,但這確實是一個骯髒的臨時黑客 IMO ……如果數字很大,則 CTE 花費的時間太長,在這種情況下不使用此 CTE 會更有效(即上述查詢執行得更好)。謝謝!
查詢規劃器無法獲得良好的行估計來充分規劃這種類型的查詢。如果你不能容忍糟糕的計劃,你將不得不做兩個查詢。您可以通過多種方式來安排它,聽起來您已經找到了一種。您可以創建一個返回集合的 plpgsql 函式,它將兩個查詢包裝到一個函式呼叫中,如果這對您來說更好的話。
你可以試一試嗎?
SELECT items.* FROM items WHERE item_id in ( SELECT item_id FROM items_categories where category_id = 626 Intersect SELECT item_id FROM items Where active = TRUE ) ORDER BY modified_at DESC LIMIT 10