Postgresql

Postgres 查詢不使用索引,但使用索引更快

  • September 18, 2020

抱歉,如果這似乎是一個重複的問題。我在 AWS RDS 上使用 Postgres 11.6。我有 2 張桌子:

CREATE TABLE public.e
(
   id character varying(32) COLLATE pg_catalog."default" NOT NULL,
   p_id character varying(32) COLLATE pg_catalog."default" NOT NULL,
   CONSTRAINT e_pkey PRIMARY KEY (id)
)
WITH (
   OIDS = FALSE
)
TABLESPACE pg_default;

CREATE TABLE public.ed
(
   e_id character varying(32) COLLATE pg_catalog."default" NOT NULL,
   <other columns + primary key>
)
WITH (
   OIDS = FALSE
)
TABLESPACE pg_default;

我有一個索引ed.e_id

CREATE INDEX ix_ed_e_id
   ON public.ed USING btree
   (e_id COLLATE pg_catalog."default" ASC NULLS LAST)
   TABLESPACE pg_default;

當我執行此查詢時:

select *
from ed, e
where e.id = ed.e_id
and e.p_id = '5c7cae8df6d10f1064b2eaf5';

(使用時問題依然存在from ed inner join e on e.id = ed.e_id

explain analyze計劃是:

Gather  (cost=1136.68..141235.01 rows=28320 width=311) (actual time=0.456..871.155 rows=102709 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 ->  Hash Join  (cost=136.68..137403.01 rows=11800 width=311) (actual time=0.241..688.095 rows=34236 loops=3)
       Hash Cond: (ed.e_id = e.id)
       ->  Parallel Seq Scan on ed ed  (cost=0.00..133210.10 rows=1544610 width=218) (actual time=0.005..314.524 rows=1235269 loops=3)
       ->  Hash  (cost=135.67..135.67 rows=81 width=93) (actual time=0.125..0.126 rows=81 loops=3)
             Buckets: 1024  Batches: 1  Memory Usage: 19kB
             ->  Bitmap Heap Scan on e e  (cost=4.91..135.67 rows=81 width=93) (actual time=0.045..0.097 rows=81 loops=3)
                   Recheck Cond: ((p_id)::text = '5c7cae8df6d10f1064b2eaf5'::text)
                   Heap Blocks: exact=31
                   ->  Bitmap Index Scan on ix_e_p_id  (cost=0.00..4.89 rows=81 width=0) (actual time=0.035..0.035 rows=81 loops=3)
                         Index Cond: ((p_id)::text = '5c7cae8df6d10f1064b2eaf5'::text)
Planning Time: 0.329 ms
Execution Time: 877.804 ms

用一個Parallel Seq Scan on eded.e_id匹配。

當 ISET SESSION enable_seqscan = OFF時,解釋計劃是:

Nested Loop  (cost=0.72..395895.14 rows=28320 width=311) (actual time=0.037..60.068 rows=102709 loops=1)
 ->  Index Scan using e_pkey on e e  (cost=0.29..917.61 rows=81 width=93) (actual time=0.019..4.995 rows=81 loops=1)
       Filter: ((p_id)::text = '5c7cae8df6d10f1064b2eaf5'::text)
       Rows Removed by Filter: 10522
 ->  Index Scan using ix_ed_e_id on ed ed  (cost=0.43..4757.83 rows=11844 width=218) (actual time=0.013..0.334 rows=1268 loops=81)
       Index Cond: (e_id = e.id)
Planning Time: 0.273 ms
Execution Time: 64.675 ms

快了一個數量級(877ms vs 64ms)!我試過VACUUM ANALYZE ed了,但這沒有幫助。我什至嘗試將e.id&更改ed.e_idUUID類型,但這也無濟於事。

如何說服 Postgres 使用ix_ed_e_id索引而不設置enable_seqscan為關閉?

似乎 PostgreSQL 高估了索引掃描的成本,這導致它更喜歡雜湊連接而不是嵌套循環連接。

有兩個參數告訴 PostgreSQL 硬體並影響它對索引掃描成本的估計:

  • random_page_cost: 與 相比越大,seq_page_costPostgreSQL 估計索引掃描的隨機 I/O 與順序 I/O 相比就越昂貴。因此,您可以降低該參數以鼓勵索引掃描。
  • effective_cache_size:這告訴優化器有多少記憶體可用於記憶體數據。如果該值很高,它將假定索引被記憶體並且價格索引掃描較低。

也許調整這些參數會改變 PostgreSQL 的想法,儘管成本估計相差甚遠。

引用自:https://dba.stackexchange.com/questions/275638