Postgresql

為什麼 Postgres 在加入物化視圖時不使用索引?

  • May 30, 2022

我有一個表定義為:

CREATE TABLE addresses (
   "address" varchar(42),
   "other_info1" text,
   "other_info2" bigint
);

我對該表進行了一些手動更正,定義為:

CREATE TABLE address_corrections (
   "address" varchar(42),
   "other_info1" text,
   "other_info2" bigint
);

我有一個簡單的 union + dedup 查詢,我將corrected_addresses其作為與上述兩個表具有相同架構的物化視圖進行持久化。物化視圖的大小與原始 table 的大小大致相同addresses,約為 400k 行。addresses我在物化視圖上有一個索引,它與原始表上的完全相同的索引匹配:

CREATE INDEX corrected_addresses_address_idx ON corrected_addresses(address);

現在,當查詢其他一些表時,我想加入該表以訪問other_info1other_info2。很簡單。但是,此連接似乎不使用 上的索引address。查詢如下所示:

EXPLAIN (analyze, buffers, format text)
SELECT transfer.from_name, transfer.to_name, transfer.value,
       transfer.address, address.other_info1, address.other_info2
FROM transfers transfer
LEFT JOIN corrected_addresses address ON transfer.address=address.address
WHERE transfer.from_name='some name here';

addressescorrected_addresses作為連接表。

這是原始地址表的結果:

Gather  (cost=1205.50..43489.92 rows=6241 width=173) (actual time=1.422..12.292 rows=2 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 Buffers: shared hit=12
 ->  Nested Loop Left Join  (cost=205.50..41865.82 rows=2600 width=173) (actual time=0.184..0.198 rows=1 loops=3)
       Buffers: shared hit=12
       ->  Parallel Bitmap Heap Scan on transfers transfer  (cost=205.07..25042.94 rows=2600 width=204) (actual time=0.176..0.180 rows=1 loops=3)
             Recheck Cond: ((from_name)::text = 'some name here'::text)
             Heap Blocks: exact=2
             Buffers: shared hit=8
             ->  Bitmap Index Scan on transfers_from_name_timestamp_idx  (cost=0.00..203.51 rows=6241 width=0) (actual time=0.400..0.401 rows=2 loops=1)
                   Index Cond: ((from_name)::text = 'some name here'::text)
                   Buffers: shared hit=6
       ->  Memoize  (cost=0.43..7.99 rows=1 width=55) (actual time=0.017..0.018 rows=0 loops=2)
             Cache Key: transfer.address
             Cache Mode: logical
             Hits: 0  Misses: 2  Evictions: 0  Overflows: 0  Memory Usage: 1kB
             Buffers: shared hit=4
             ->  Index Scan using addresses_idx on addresses a  (cost=0.42..7.98 rows=1 width=55) (actual time=0.013..0.013 rows=0 loops=2)
                   Index Cond: ((address)::text = (transfer.address)::text)
                   Buffers: shared hit=4
Planning:
 Buffers: shared hit=334 dirtied=1
Planning Time: 5.692 ms
Execution Time: 12.601 ms

對於物化視圖:

Gather  (cost=10501.47..35970.26 rows=6241 width=173) (actual time=100.881..111.703 rows=2 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 Buffers: shared hit=5327
 ->  Parallel Hash Left Join  (cost=9501.47..34346.16 rows=2600 width=173) (actual time=94.729..95.259 rows=1 loops=3)
       Hash Cond: ((transfer.address)::text = (a.address)::text)
       Buffers: shared hit=5327
       ->  Parallel Bitmap Heap Scan on transfers transfer  (cost=205.07..25042.94 rows=2600 width=204) (actual time=0.162..0.164 rows=1 loops=3)
             Recheck Cond: ((from_name)::text = 'some name here'::text)
             Heap Blocks: exact=1
             Buffers: shared hit=8
             ->  Bitmap Index Scan on transfers_from_name_timestamp_idx  (cost=0.00..203.51 rows=6241 width=0) (actual time=0.070..0.071 rows=2 loops=1)
                   Index Cond: ((from_name)::text = 'some name here'::text)
                   Buffers: shared hit=6
       ->  Parallel Hash  (cost=7051.18..7051.18 rows=179618 width=55) (actual time=92.901..92.901 rows=143694 loops=3)
             Buckets: 524288  Batches: 1  Memory Usage: 41856kB
             Buffers: shared hit=5255
             ->  Parallel Seq Scan on corrected_addresses a  (cost=0.00..7051.18 rows=179618 width=55) (actual time=0.006..27.550 rows=143694 loops=3)
                   Buffers: shared hit=5255
Planning:
 Buffers: shared hit=37
Planning Time: 0.482 ms
Execution Time: 111.758 ms

這兩個執行時間都很短,我相信這部分是因為記憶體。新執行的差異約為 100 毫秒與幾秒鐘。抱歉,我在這裡找不到範例。

有誰知道為什麼 Postgres 不使用物化視圖的索引?

對“轉移”條件的可怕估計(預期為 2600,發現為 1)導致散列連接和嵌套循環的成本看起來非常相似。

為什麼錯誤估計 2600 倍會導致一個查詢使用不同的計劃而不是另一個?不知道,可能只是運氣。我懷疑它甚至可以重現,更不有趣。修復根本問題。

如果ANALYZE transfers沒有修復它,那麼您將需要深入研究“from_name”列的 pg_stats。

對於那些後來偶然發現這個問題的人,這是我為解決它所做的:

SELECT transfer.from_name, transfer.to_name, transfer.value,
       transfer.address, address.other_info1, address.other_info2
FROM transfers transfer
LEFT JOIN LATERAL (SELECT * FROM corrected_addresses a_inner ON transfer.address=a_inner.address LIMIT 1) AS address ON true
WHERE transfer.from_name='some name here';

我想LIMIT 1這裡使這個查詢與以前的連接有很大不同。就我而言,這是可取的,但並非所有人都如此。儘管如此,為什麼在原始表而不是物化視圖上使用索引是我無法理解的,但如果它有效,它就可以工作。

引用自:https://dba.stackexchange.com/questions/312690