Postgresql

索引分區表以防止 Postgres 進行順序掃描

  • March 28, 2019

我有一個帶有分區表的 Postgres 數據庫,其中包含約 2,000,000,000 個條目。

我已經根據“標識符”的第一個字母定義了一個分區數據庫——它被分成 37 個子表,

$$ 0-9, a-z, default (catchall for everything else) $$. 該數據庫非常簡單直接,定義如下。

create table entries (
 id                bigserial,
 identifier        text null,
 password          text null,
 additional_fields jsonb
)
 partition by list (lower(left(identifier, 1)));
ALTER DATABASE credentials SET constraint_exclusion=on;

CREATE TABLE entries_0 PARTITION OF entries for values in ('0');
CREATE TABLE entries_1 PARTITION OF entries for values in ('1');
CREATE TABLE entries_2 PARTITION OF entries for values in ('2');
CREATE TABLE entries_3 PARTITION OF entries for values in ('3');
...
CREATE TABLE entries_z PARTITION OF entries for values in ('z');

ALTER TABLE entries_0 ADD CONSTRAINT first_letter  CHECK (lower(left(identifier, 1)) = '0');
ALTER TABLE entries_1 ADD CONSTRAINT first_letter  CHECK (lower(left(identifier, 1)) = '1');
ALTER TABLE entries_2 ADD CONSTRAINT first_letter  CHECK (lower(left(identifier, 1)) = '2');
ALTER TABLE entries_3 ADD CONSTRAINT first_letter  CHECK (lower(left(identifier, 1)) = '3');
...
ALTER TABLE entries_z ADD CONSTRAINT first_letter  CHECK (lower(left(identifier, 1)) = 'z');

CREATE INDEX ident_idx on entries(identifier);

但是,當我執行它時EXPLAIN,它說它仍在進行順序掃描。

EXPLAIN SELECT * FROM entries where identifier = 'some_identifier_from_subtable_s' LIMIT 1;

輸出:

Limit  (cost=0.00..140.92 rows=1 width=104)
 ->  Append  (cost=0.00..43418531.72 rows=308113 width=104)
       ->  Seq Scan on entries_0  (cost=0.00..23239.38 rows=2 width=68)
             Filter: (identifier = 'some_identifier_from_subtable_s'::text)
       ->  Seq Scan on entries_1  (cost=0.00..150187.81 rows=6 width=68)
             Filter: (identifier = 'some_identifier_from_subtable_s'::text)
       ->  Seq Scan on entries_2  (cost=0.00..94694.38 rows=4 width=67)
             Filter: (identifier = 'some_identifier_from_subtable_s'::text)
       ->  Seq Scan on entries_3  (cost=0.00..81656.71 rows=3 width=67)
             Filter: (identifier = 'some_identifier_from_subtable_s'::text)

       ... etc.

       ->  Seq Scan on entries_z  (cost=0.00..579207.95 rows=13 width=69)
             Filter: (identifier = 'some_identifier_from_subtable_s'::text)
       ->  Seq Scan on entries_default  (cost=0.00..15582.36 rows=4 width=69)
             Filter: (identifier = 'some_identifier_from_subtable_s'::text)

我做錯了什麼?智能分區應該能夠將查詢重定向到entries_s分區,不是嗎?然後CREATE INDEX ident_idx on entries(identifier);應該使查詢通過索引?

添加顯式查詢後,新計劃如下所示:

FROM   entries
WHERE  identifier = 'some_identifier_from_subtable_s'
AND    lower(left(identifier, 1)) = 's'  -- 1st letter of above identifier
LIMIT  1;

輸出:

Limit  (cost=1000.00..2583053.76 rows=1 width=71)
 ->  Gather  (cost=1000.00..2583053.76 rows=1 width=71)
       Workers Planned: 2
       ->  Parallel Append  (cost=0.00..2582053.66 rows=1 width=71)
             ->  Parallel Seq Scan on entries_s  (cost=0.00..2582053.66 rows=1 width=71)
                   Filter: ((identifier = 'some_identifier_from_subtable_s'::text) AND (lower("left"(identifier, 1)) = 's'::text))

它仍在對entries_s. 是否聲明索引CREATE INDEX ident_idx on entries(identifier);而不將其傳播到所有分區?

嘗試顯式添加分區鍵(冗餘!)。像:

SELECT *
FROM   entries 
WHERE  identifier = 'some_identifier_from_subtable_s'
AND    lower(left(identifier, 1)) = 's'  -- 1st letter of above identifier
LIMIT  1;

這應該允許 Postgres 理解它可以從查詢中刪除所有其他分區。

當然可以得出$1

AND    lower(left(identifier, 1)) = lower(left($1, 1))

您仍然會在一個分區上看到順序掃描,除非您按照您的想法創建索引:

CREATE INDEX ident_idx on entries(identifier);

這適用於 Postgres 11 或更高版本,因為引用手冊:

CREATE INDEX分區表上呼叫時,預設行為是遞歸到所有分區以確保它們都具有匹配的索引。

在 Postgres 10 或更早的版本中,您必須為每個分區創建索引。

ANALYZE如果在 autovacuum 有時間啟動之前立即跟進查詢,則可能必須在創建索引後在表上執行。

引用自:https://dba.stackexchange.com/questions/233406