Postgresql
為什麼 Postgres 在加入物化視圖時不使用索引?
我有一個表定義為:
CREATE TABLE addresses ( "address" varchar(42), "other_info1" text, "other_info2" bigint );
我對該表進行了一些手動更正,定義為:
CREATE TABLE address_corrections ( "address" varchar(42), "other_info1" text, "other_info2" bigint );
我有一個簡單的 union + dedup 查詢,我將
corrected_addresses
其作為與上述兩個表具有相同架構的物化視圖進行持久化。物化視圖的大小與原始 table 的大小大致相同addresses
,約為 400k 行。addresses
我在物化視圖上有一個索引,它與原始表上的完全相同的索引匹配:CREATE INDEX corrected_addresses_address_idx ON corrected_addresses(address);
現在,當查詢其他一些表時,我想加入該表以訪問
other_info1
和other_info2
。很簡單。但是,此連接似乎不使用 上的索引address
。查詢如下所示:EXPLAIN (analyze, buffers, format text) SELECT transfer.from_name, transfer.to_name, transfer.value, transfer.address, address.other_info1, address.other_info2 FROM transfers transfer LEFT JOIN corrected_addresses address ON transfer.address=address.address WHERE transfer.from_name='some name here';
與
addresses
或corrected_addresses
作為連接表。這是原始地址表的結果:
Gather (cost=1205.50..43489.92 rows=6241 width=173) (actual time=1.422..12.292 rows=2 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=12 -> Nested Loop Left Join (cost=205.50..41865.82 rows=2600 width=173) (actual time=0.184..0.198 rows=1 loops=3) Buffers: shared hit=12 -> Parallel Bitmap Heap Scan on transfers transfer (cost=205.07..25042.94 rows=2600 width=204) (actual time=0.176..0.180 rows=1 loops=3) Recheck Cond: ((from_name)::text = 'some name here'::text) Heap Blocks: exact=2 Buffers: shared hit=8 -> Bitmap Index Scan on transfers_from_name_timestamp_idx (cost=0.00..203.51 rows=6241 width=0) (actual time=0.400..0.401 rows=2 loops=1) Index Cond: ((from_name)::text = 'some name here'::text) Buffers: shared hit=6 -> Memoize (cost=0.43..7.99 rows=1 width=55) (actual time=0.017..0.018 rows=0 loops=2) Cache Key: transfer.address Cache Mode: logical Hits: 0 Misses: 2 Evictions: 0 Overflows: 0 Memory Usage: 1kB Buffers: shared hit=4 -> Index Scan using addresses_idx on addresses a (cost=0.42..7.98 rows=1 width=55) (actual time=0.013..0.013 rows=0 loops=2) Index Cond: ((address)::text = (transfer.address)::text) Buffers: shared hit=4 Planning: Buffers: shared hit=334 dirtied=1 Planning Time: 5.692 ms Execution Time: 12.601 ms
對於物化視圖:
Gather (cost=10501.47..35970.26 rows=6241 width=173) (actual time=100.881..111.703 rows=2 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=5327 -> Parallel Hash Left Join (cost=9501.47..34346.16 rows=2600 width=173) (actual time=94.729..95.259 rows=1 loops=3) Hash Cond: ((transfer.address)::text = (a.address)::text) Buffers: shared hit=5327 -> Parallel Bitmap Heap Scan on transfers transfer (cost=205.07..25042.94 rows=2600 width=204) (actual time=0.162..0.164 rows=1 loops=3) Recheck Cond: ((from_name)::text = 'some name here'::text) Heap Blocks: exact=1 Buffers: shared hit=8 -> Bitmap Index Scan on transfers_from_name_timestamp_idx (cost=0.00..203.51 rows=6241 width=0) (actual time=0.070..0.071 rows=2 loops=1) Index Cond: ((from_name)::text = 'some name here'::text) Buffers: shared hit=6 -> Parallel Hash (cost=7051.18..7051.18 rows=179618 width=55) (actual time=92.901..92.901 rows=143694 loops=3) Buckets: 524288 Batches: 1 Memory Usage: 41856kB Buffers: shared hit=5255 -> Parallel Seq Scan on corrected_addresses a (cost=0.00..7051.18 rows=179618 width=55) (actual time=0.006..27.550 rows=143694 loops=3) Buffers: shared hit=5255 Planning: Buffers: shared hit=37 Planning Time: 0.482 ms Execution Time: 111.758 ms
這兩個執行時間都很短,我相信這部分是因為記憶體。新執行的差異約為 100 毫秒與幾秒鐘。抱歉,我在這裡找不到範例。
有誰知道為什麼 Postgres 不使用物化視圖的索引?
對“轉移”條件的可怕估計(預期為 2600,發現為 1)導致散列連接和嵌套循環的成本看起來非常相似。
為什麼錯誤估計 2600 倍會導致一個查詢使用不同的計劃而不是另一個?不知道,可能只是運氣。我懷疑它甚至可以重現,更不有趣。修復根本問題。
如果
ANALYZE transfers
沒有修復它,那麼您將需要深入研究“from_name”列的 pg_stats。
對於那些後來偶然發現這個問題的人,這是我為解決它所做的:
SELECT transfer.from_name, transfer.to_name, transfer.value, transfer.address, address.other_info1, address.other_info2 FROM transfers transfer LEFT JOIN LATERAL (SELECT * FROM corrected_addresses a_inner ON transfer.address=a_inner.address LIMIT 1) AS address ON true WHERE transfer.from_name='some name here';
我想
LIMIT 1
這裡使這個查詢與以前的連接有很大不同。就我而言,這是可取的,但並非所有人都如此。儘管如此,為什麼在原始表而不是物化視圖上使用索引是我無法理解的,但如果它有效,它就可以工作。