優化單個“下一個”行的 UPDATE 查詢
我們從 new relic 獲得的資訊表明,我們的查詢性能很差(執行大約需要 16 秒)。我們在 Debian Linux 上使用 PostgreSQL 11.8。
桌子:
postgres=# SELECT table_name, column_name, data_type, character_maximum_length, character_octet_length, numeric_precision, numeric_precision_radix, numeric_scale, is_nullable, is_updatable FROM information_schema.columns WHERE table_name = 'store_codeinventory'; table_name | column_name | data_type | character_maximum_length | character_octet_length | numeric_precision | numeric_precision_radix | numeric_scale | is_nullable | is_updatable ---------------------+-------------------+--------------------------+--------------------------+------------------------+-------------------+-------------------------+---------------+-------------+-------------- store_codeinventory | id | integer | | | 32 | 2 | 0 | NO | YES store_codeinventory | code | character varying | 255 | 1020 | | | | NO | YES store_codeinventory | limit_days | integer | | | 32 | 2 | 0 | NO | YES store_codeinventory | deactivation_date | timestamp with time zone | | | | | | YES | YES store_codeinventory | cost | numeric | | | 8 | 10 | 2 | NO | YES store_codeinventory | price | numeric | | | 8 | 10 | 2 | NO | YES store_codeinventory | created_date | timestamp with time zone | | | | | | YES | YES store_codeinventory | claimed_date | timestamp with time zone | | | | | | YES | YES store_codeinventory | is_active | boolean | | | | | | NO | YES store_codeinventory | book_id | integer | | | 32 | 2 | 0 | YES | YES store_codeinventory | code_import_id | integer | | | 32 | 2 | 0 | NO | YES store_codeinventory | creator_id | integer | | | 32 | 2 | 0 | NO | YES store_codeinventory | inv_id | integer | | | 32 | 2 | 0 | YES | YES store_codeinventory | label_id | integer | | | 32 | 2 | 0 | YES | YES store_codeinventory | recipient_id | integer | | | 32 | 2 | 0 | YES | YES store_codeinventory | purchase_id | integer | | | 32 | 2 | 0 | YES | YES (16 rows)
我無權訪問生產,但我嘗試填寫本地數據庫並了解問題。詢問:
UPDATE store_codeinventory set recipient_id = 1168, claimed_date = NOW() where id = ANY((select array(select id from store_codeinventory where recipient_id is NULL and inv_id = 72 and is_active=true ORDER BY ID ASC LIMIT 1 FOR UPDATE)) ::integer[]) and recipient_id is NULL;
查詢計劃:
QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Update on store_codeinventory (cost=0.64..47.43 rows=10 width=92) (actual time=0.291..0.291 rows=0 loops=1) InitPlan 2 (returns $3) -> Result (cost=0.34..0.35 rows=1 width=32) (actual time=0.060..0.060 rows=1 loops=1) InitPlan 1 (returns $2) -> Limit (cost=0.29..0.34 rows=1 width=10) (actual time=0.058..0.058 rows=1 loops=1) -> LockRows (cost=0.29..1031.77 rows=19963 width=10) (actual time=0.057..0.057 rows=1 loops=1) -> Index Scan using store_codeinventory_pkey on store_codeinventory store_codeinventory_1 (cost=0.29..832.14 rows=19963 width=10) (actual time=0.053..0.053 rows=1 loops=1) Filter: ((recipient_id IS NULL) AND is_active AND (inv_id = 72)) Rows Removed by Filter: 94 -> Index Scan using store_codeinventory_pkey on store_codeinventory (cost=0.29..47.08 rows=10 width=92) (actual time=0.065..0.066 rows=1 loops=1) Index Cond: (id = ANY ($3)) Filter: (recipient_id IS NULL) Planning Time: 0.365 ms Execution Time: 0.327 ms (14 rows)
我不太擅長寫sql。有人可以給我建議瓶頸在哪裡嗎?據我了解,可能有兩個地方:LockRows 和 Index Scan,對嗎?我認為第一個是引起
FOR UPDATE
的,第二個是 -select where ORDER BY ID ASC LIMIT
?有沒有辦法優化它們?UPD。有關索引的資訊(我刪除了未在查詢中使用的列):
postgres=# SELECT indexname, indexdef FROM pg_indexes WHERE schemaname = 'public' and tablename = 'store_codeinventory' AND (indexdef LIKE '%inv_id%' OR indexdef LIKE '%pkey%' OR indexdef LIKE '%recipient_id%'); indexname | indexdef ----------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------- store_codeinventory_inv_id_recipient_id_is_active_822b92e5_idx | CREATE INDEX store_codeinventory_inv_id_recipient_id_is_active_822b92e5_idx ON public.store_codeinventory USING btree (inv_id, recipient_id, is_active) store_codeinventory_recipient_id_e32fbb18 | CREATE INDEX store_codeinventory_recipient_id_e32fbb18 ON public.store_codeinventory USING btree (recipient_id) store_codeinventory_inv_id_21e20eb7 | CREATE INDEX store_codeinventory_inv_id_21e20eb7 ON public.store_codeinventory USING btree (inv_id) store_codeinventory_pkey | CREATE UNIQUE INDEX store_codeinventory_pkey ON public.store_codeinventory USING btree (id) (4 rows)
UPD2我不太擅長 sql,但我決定分析該查詢並得出結論,我們可以簡化它。您對此變體的看法:
UPDATE store_codeinventory set recipient_id = 1168, claimed_date = NOW() where id = (select id from store_codeinventory where recipient_id is NULL and inv_id = 72 and is_active=true ORDER BY ID ASC LIMIT 1 FOR UPDATE) and recipient_id is NULL;
我認為我們不需要數組,因為在子選擇中我們只會得到一個或 null。然後我認為
LIMIT 1
獲得ordered by ID ASC
最小的id值是一樣的。如果我們寫:UPDATE store_codeinventory set recipient_id = 1168, claimed_date = NOW() where id = (select MIN(id) from store_codeinventory where recipient_id is NULL and inv_id = 72 and is_active=true);
是否相同?
您的第一個查詢是昂貴的廢話 - 就像您已經發現自己一樣。該陣列沒有任何作用。
第二個查詢更有意義。
第三個查詢背後的想法是基於一個誤解。
ORDER BY id LIMIT 1
至少與 一樣有效min(id)
。兩者都可以使用索引。第一個是更通用的。除其他外,您可以添加一個鎖定子句,例如FOR UPDATE
- 如果可以同時進行寫訪問,您需要確保它是安全的。所以:
UPDATE store_codeinventory SET recipient_id = 1168 , claimed_date = now() WHERE id = ( SELECT id FROM store_codeinventory WHERE recipient_id IS NULL AND inv_id = 72 AND is_active ORDER BY id LIMIT 1 FOR UPDATE SKIP LOCKED -- !? );
有了索引支持,這應該很快。您需要正確的索引。任何額外的索引都會使寫入表的成本更高。因此,添加您實際需要的索引並避免使用其他索引。
最佳索引取決於相關列的各自特徵和數據分佈:
recipient_id inv_id is_active id
範例:如果
recipient_id
NULL 值很少,則部分索引會捕捉:CREATE INDEX ON store_codeinventory (id) WHERE recipient_id IS NULL;
或者,如果謂詞始終相同,則此索引將是完美的:
CREATE INDEX ON store_codeinventory (id) WHERE recipient_id IS NULL AND inv_id = 72 AND is_active;
SKIP LOCKED
僅為競爭下一個“免費”的多個事務添加鎖定子句id
- 這看起來是我們的案例。見(密切相關!):