Postgres UPDATE 使用來自另一個表的數據 - 僅索引掃描用於相關子查詢但不連接
語境
我正在調整
UPDATE
從另一個(大)表中選擇的批量。我的目的是提供一個覆蓋索引來支持源表的僅索引掃描。我意識到必須清理源表以更新其可見性圖。到目前為止,我的調查表明,優化器選擇僅在
UPDATE
使用相關子查詢時對源表進行索引掃描,但在使用連接時似乎使用標準索引掃描(UPDATE...FROM
)。我問這個問題是為了理解為什麼。我在這裡提供一個簡化的例子來說明差異。
我使用的是 Postgres 9.6.8,但 10.11 和 11.6 的計劃非常相似。我已經使用官方圖像複製了在 Docker 中安裝 vanilla 9.6 Postgres 的計劃,以及此處的 db<>fiddle 。
設置
CREATE TABLE lookup ( surrogate_key BIGINT PRIMARY KEY, natural_key TEXT NOT NULL UNIQUE, data TEXT NOT NULL); INSERT INTO lookup SELECT id, 'nk'||id, random()::text FROM generate_series(1,400000) id; CREATE UNIQUE INDEX lookup_ix ON lookup(natural_key, surrogate_key); VACUUM ANALYSE lookup; CREATE TABLE target ( target_id BIGINT PRIMARY KEY, lookup_natural_key TEXT NOT NULL, lookup_surrogate_key BIGINT, data TEXT NOT NULL ); INSERT INTO target (target_id, lookup_natural_key, data) SELECT id+1000, 'nk'||id, random()::text FROM generate_series(1,1000) id; ANALYSE target;
使用連接更新
EXPLAIN (ANALYSE, VERBOSE, BUFFERS) UPDATE target SET lookup_surrogate_key = surrogate_key FROM lookup WHERE lookup_natural_key = natural_key;
標準索引掃描- 因此從表
lookup_ix
中讀取堆塊:lookup
Update on public.target (cost=0.42..7109.00 rows=1000 width=54) (actual time=76.688..76.688 rows=0 loops=1) Buffers: shared hit=8514 read=550 dirtied=16 -> Nested Loop (cost=0.42..7109.00 rows=1000 width=54) (actual time=0.050..62.493 rows=1000 loops=1) Output: target.target_id, target.lookup_natural_key, lookup.surrogate_key, target.data, target.ctid, lookup.ctid Buffers: shared hit=3479 read=535 -> Seq Scan on public.target (cost=0.00..19.00 rows=1000 width=40) (actual time=0.013..7.691 rows=1000 loops=1) Output: target.target_id, target.lookup_natural_key, target.data, target.ctid Buffers: shared hit=9 -> Index Scan using lookup_ix on public.lookup (cost=0.42..7.08 rows=1 width=22) (actual time=0.020..0.027 rows=1 loops=1000) Output: lookup.surrogate_key, lookup.ctid, lookup.natural_key Index Cond: (lookup.natural_key = target.lookup_natural_key) Buffers: shared hit=3470 read=535 Planning time: 0.431 ms Execution time: 76.826 ms
使用相關子查詢更新
EXPLAIN (ANALYSE, VERBOSE, BUFFERS) UPDATE target SET lookup_surrogate_key = ( SELECT surrogate_key FROM lookup WHERE lookup_natural_key = natural_key);
lookup_ix
僅按預期掃描索引:Update on public.target (cost=0.00..4459.00 rows=1000 width=47) (actual time=52.947..52.947 rows=0 loops=1) Buffers: shared hit=8050 read=15 dirtied=16 -> Seq Scan on public.target (cost=0.00..4459.00 rows=1000 width=47) (actual time=0.052..40.306 rows=1000 loops=1) Output: target.target_id, target.lookup_natural_key, (SubPlan 1), target.data, target.ctid Buffers: shared hit=3015 SubPlan 1 -> Index Only Scan using lookup_ix on public.lookup (cost=0.42..4.44 rows=1 width=8) (actual time=0.013..0.019 rows=1 loops=1000) Output: lookup.surrogate_key Index Cond: (lookup.natural_key = target.lookup_natural_key) Heap Fetches: 0 Buffers: shared hit=3006 Planning time: 0.130 ms Execution time: 52.987 ms
db<>在這裡擺弄
我知道查詢在邏輯上並不相同(當
lookup
給定的沒有/多行時行為不同natural_key
),但我對lookup_ix
.誰能解釋一下為什麼加入版本不能使用僅索引掃描?
只有當索引掃描返回的所有列都儲存在索引中時,才能使用僅索引掃描。
您的索引
lookup_ix
僅包含列natural_key
和surrogate_key
,但從執行計劃中您可以看到:-> Index Scan using lookup_ix on public.lookup Output: lookup.surrogate_key, lookup.ctid, lookup.natural_key
所以索引掃描獲取行
ctid
的物理地址。這是因為我們要執行UPDATE
; 請參閱以下評論src\include/nodes/plannodes.h
:* When doing UPDATE, DELETE, or SELECT FOR UPDATE/SHARE, we have to uniquely * identify all the source rows, not only those from the target relations, so * that we can perform EvalPlanQual rechecking at need. For plain tables we * can just fetch the TID, much as for a target relation; this case is * represented by ROW_MARK_REFERENCE. Otherwise (for example for VALUES or * FUNCTION scans) we have to copy the whole row value. ROW_MARK_COPY is * pretty inefficient, since most of the time we'll never need the data; but * fortunately the overhead is usually not performance-critical in practice. * By default we use ROW_MARK_COPY for foreign tables, but if the FDW has * a concept of rowid it can request to use ROW_MARK_REFERENCE instead. * (Again, this probably doesn't make sense if a physical remote fetch is * needed, but for FDWs that map to local storage it might be credible.)
(
EvalPlanQual
PostgreSQL 是一個行話術語/函式名稱,用於獲取將被鎖定或修改的行的最新版本。)要確定是這種情況,請嘗試
EXPLAIN
在沒有更新的情況下執行相同的查詢,您將看到計劃是相同的,只是ctid
沒有獲取 s,並且它使用僅索引掃描。現在有一件事情讓我感到驚訝:
每個(葉)索引條目確實包含
ctid
它引用的行,所以我看不出為什麼不能通過僅索引掃描來獲取它的原因。我在郵件列表上問過,但似乎改變它可能不是微不足道的。