Postgresql
索引分區表以防止 Postgres 進行順序掃描
我有一個帶有分區表的 Postgres 數據庫,其中包含約 2,000,000,000 個條目。
我已經根據“標識符”的第一個字母定義了一個分區數據庫——它被分成 37 個子表,
$$ 0-9, a-z, default (catchall for everything else) $$. 該數據庫非常簡單直接,定義如下。
create table entries ( id bigserial, identifier text null, password text null, additional_fields jsonb ) partition by list (lower(left(identifier, 1))); ALTER DATABASE credentials SET constraint_exclusion=on; CREATE TABLE entries_0 PARTITION OF entries for values in ('0'); CREATE TABLE entries_1 PARTITION OF entries for values in ('1'); CREATE TABLE entries_2 PARTITION OF entries for values in ('2'); CREATE TABLE entries_3 PARTITION OF entries for values in ('3'); ... CREATE TABLE entries_z PARTITION OF entries for values in ('z'); ALTER TABLE entries_0 ADD CONSTRAINT first_letter CHECK (lower(left(identifier, 1)) = '0'); ALTER TABLE entries_1 ADD CONSTRAINT first_letter CHECK (lower(left(identifier, 1)) = '1'); ALTER TABLE entries_2 ADD CONSTRAINT first_letter CHECK (lower(left(identifier, 1)) = '2'); ALTER TABLE entries_3 ADD CONSTRAINT first_letter CHECK (lower(left(identifier, 1)) = '3'); ... ALTER TABLE entries_z ADD CONSTRAINT first_letter CHECK (lower(left(identifier, 1)) = 'z'); CREATE INDEX ident_idx on entries(identifier);
但是,當我執行它時
EXPLAIN
,它說它仍在進行順序掃描。EXPLAIN SELECT * FROM entries where identifier = 'some_identifier_from_subtable_s' LIMIT 1;
輸出:
Limit (cost=0.00..140.92 rows=1 width=104) -> Append (cost=0.00..43418531.72 rows=308113 width=104) -> Seq Scan on entries_0 (cost=0.00..23239.38 rows=2 width=68) Filter: (identifier = 'some_identifier_from_subtable_s'::text) -> Seq Scan on entries_1 (cost=0.00..150187.81 rows=6 width=68) Filter: (identifier = 'some_identifier_from_subtable_s'::text) -> Seq Scan on entries_2 (cost=0.00..94694.38 rows=4 width=67) Filter: (identifier = 'some_identifier_from_subtable_s'::text) -> Seq Scan on entries_3 (cost=0.00..81656.71 rows=3 width=67) Filter: (identifier = 'some_identifier_from_subtable_s'::text) ... etc. -> Seq Scan on entries_z (cost=0.00..579207.95 rows=13 width=69) Filter: (identifier = 'some_identifier_from_subtable_s'::text) -> Seq Scan on entries_default (cost=0.00..15582.36 rows=4 width=69) Filter: (identifier = 'some_identifier_from_subtable_s'::text)
我做錯了什麼?智能分區應該能夠將查詢重定向到
entries_s
分區,不是嗎?然後CREATE INDEX ident_idx on entries(identifier);
應該使查詢通過索引?添加顯式查詢後,新計劃如下所示:
FROM entries WHERE identifier = 'some_identifier_from_subtable_s' AND lower(left(identifier, 1)) = 's' -- 1st letter of above identifier LIMIT 1;
輸出:
Limit (cost=1000.00..2583053.76 rows=1 width=71) -> Gather (cost=1000.00..2583053.76 rows=1 width=71) Workers Planned: 2 -> Parallel Append (cost=0.00..2582053.66 rows=1 width=71) -> Parallel Seq Scan on entries_s (cost=0.00..2582053.66 rows=1 width=71) Filter: ((identifier = 'some_identifier_from_subtable_s'::text) AND (lower("left"(identifier, 1)) = 's'::text))
它仍在對
entries_s
. 是否聲明索引CREATE INDEX ident_idx on entries(identifier);
而不將其傳播到所有分區?
嘗試顯式添加分區鍵(冗餘!)。像:
SELECT * FROM entries WHERE identifier = 'some_identifier_from_subtable_s' AND lower(left(identifier, 1)) = 's' -- 1st letter of above identifier LIMIT 1;
這應該允許 Postgres 理解它可以從查詢中刪除所有其他分區。
當然可以得出
$1
:AND lower(left(identifier, 1)) = lower(left($1, 1))
您仍然會在一個分區上看到順序掃描,除非您按照您的想法創建索引:
CREATE INDEX ident_idx on entries(identifier);
這適用於 Postgres 11 或更高版本,因為引用手冊:
在
CREATE INDEX
分區表上呼叫時,預設行為是遞歸到所有分區以確保它們都具有匹配的索引。在 Postgres 10 或更早的版本中,您必須為每個分區創建索引。
ANALYZE
如果在 autovacuum 有時間啟動之前立即跟進查詢,則可能必須在創建索引後在表上執行。