Postgres 選擇 seq 掃描而不是索引掃描

March 16, 2021

我有以下查詢：
SELECT * 
FROM foo
 LEFT OUTER JOIN foo_history ON foo.id = foo_history.foo_id 
WHERE foo.entity_id = 11111
foo 表大約有 800 萬行，而 foo_history 大約有 30 億行。
Postgres 選擇以下計劃
Gather  (cost=1000.58..132862.85 rows=9514 width=70) (actual time=462.013..5886.972 rows=14304 loops=1)
 Workers Planned: 4
 Workers Launched: 4
 -&gt;  Nested Loop Left Join  (cost=0.58..130911.45 rows=2378 width=70) (actual time=380.838..1450.773 rows=2861 loops=5)
       -&gt;  Parallel Seq Scan on foo  (cost=0.00..93746.74 rows=6 width=24) (actual time=366.122..366.168 rows=6 loops=5)
             Filter: (entity_id = 11111)
             Rows Removed by Filter: 1634858
       -&gt;  Index Scan using ix_foo_history_foo_id on foo_history foo_history_1  (cost=0.58..6005.63 rows=18849 width=46) (actual time=3.905..192.879 rows=511 loops=28)
             Index Cond: (foo_id = foo.id)
Planning Time: 1.091 ms
Execution Time: 5891.124 ms
這需要對 foo 表（800 萬行）進行 seqscan。但是當禁用 seqscan 時，postgres 會選擇一個更快的計劃：
Nested Loop Left Join  (cost=1.01..148667.31 rows=9514 width=70) (actual time=0.138..27.413 rows=14304 loops=1)
 -&gt;  Index Scan using ix_foo_entity_id on od  (cost=0.43..8.46 rows=24 width=24) (actual time=0.048..0.090 rows=28 loops=1)
       Index Cond: (entity_id = 11111)
 -&gt;  Index Scan using ix_foo_history_foo_id on foo_history foo_history_1  (cost=0.58..6005.63 rows=18849 width=46) (actual time=0.027..0.678 rows=511 loops=28)
       Index Cond: (foo_id = foo.id)
Planning Time: 1.135 ms
Execution Time: 28.927 ms
它使用 foo.entity_id 上的索引來查找相關行。
統計數據對我來說似乎並不重要，因為它們與執行查詢時獲得的數據大致匹配（或者我在 EXPLAIN ANLYZE 中誤讀了 smth）。random_page_cost 也設置為 1.1，這應該“鼓勵”postgres 進行索引掃描。所以我不知道這裡有什麼誤導postgres。也許“使用 ix_foo_history_foo_id 進行索引掃描”的成本太高了？但是為什麼會這樣，因為 random_page_cost 設置為一個合理的值！
編輯：
Postgresql 版本：x86_64-pc-linux-gnu 上的 PostgreSQL 12.5，由 gcc (Debian 8.3.0-6) 8.3.0 編譯，64 位
表定義：
create table foo
(
   id integer default nextval('foo_id_seq'::regclass) not null
       constraint foo_pkey
           primary key,
   entity_id integer not null
       constraint foo_entity_id_fk_entity_id
           references entity
               on delete cascade,
   misc1 integer not null,
   misc2 integer not null,
   misc3 integer
);

create table foo_history
(
   id integer default nextval('foo_history_id_seq'::regclass) not null
       constraint foo_history_pkey
           primary key,
   foo_id integer not null
       constraint foo_history_foo_id_fk_foo_id
           references foo
               on delete cascade,
   history_day integer not null,
   misc1 integer default 0 not null,
   misc2 numeric(10,2) default '0'::numeric not null,
   misc3 numeric(10,2) default '0'::numeric not null,
   misc4 numeric(10,2) default '0'::numeric not null,
   misc5 integer
);
數據和背景：
我有一個表 foo，每個 foo 行都附加到一個實體。每個 foo 都有自己的歷史。目標是獲取與給定實體相關的所有 foo 項目及其歷史記錄。
foo.entity_id 包含 586972 個不同的值
這是一個直方圖，表示有多少實體與它們相關的 0 到 20 個 foo 項目等。
我還注意到以下查詢
SELECT COUNT(*) FROM (
 SELECT * 
 FROM foo_history 
   LEFT OUTER JOIN foo ON foo.id = foo_history.foo_id 
 WHERE foo.entity_id = 11111
) t
使用第二個計劃（快速的一個）

是的，你是對的，問題是Index Scan using ix_foo_history_foo_id. 觀察到它估計的結果行太多。
也許它有助於
ANALYZE foo_history;
如果僅此一項不能使估計正確，請嘗試
ALTER TABLE foo_history ALTER foo_id SET STATISTICS 1000;
ANALYZE foo_history;
如果這不傾斜比例，也許您可以添加INCLUDE更多列ix_foo_history_foo_id並僅用您真正需要的那些列替換SELECT *，以便獲得僅索引掃描。

引用自：https://dba.stackexchange.com/questions/287056

Postgres 選擇 seq 掃描而不是索引掃描

相關問答

postgresql 計劃器/優化器的特定索引問題

PostgreSQL 執行計劃中帶有數字和下劃線的奇怪表

如何在 Postgres 上對傾斜數據集進行更高效的查詢

具有大 IN 的 Postgres 查詢，並且在臨時表上加入似乎不起作用

如何防止 PostgreSQL 重寫 OUTER JOIN 查詢？

Postgresql 10：具有精確堆塊的點陣圖堆掃描