Postgresql

PostgreSQL - 多列 B-Tree 索引如何與第一列的 order by 和第二列的 IN 查找一起工作?

  • January 13, 2022

我創建了這樣的表(類似於http://use-the-index-luke.com/sql/example-schema/postgresql/performance-testing-scalability中的範例)

CREATE TABLE scale_data (
  section NUMERIC NOT NULL,
  id1     NUMERIC NOT NULL, -- unique values simulating ID or Timestamp
  id2     NUMERIC NOT NULL -- a kind of Type
);

填充它:

INSERT INTO scale_data
SELECT sections.sections, sections.sections*10000 + gen.gen
    , CEIL(RANDOM()*100) 
 FROM GENERATE_SERIES(1, 300)     sections,
      GENERATE_SERIES(1, 90000) gen
WHERE gen <= sections * 300;

它生成了 13545000 條記錄。

綜合指數就可以了:

CREATE INDEX id1_id2_idx
 ON public.scale_data
 USING btree
 (id1, id2);

並選擇#1:

select id2 from scale_data 
where id2 in (50)
order by id1 desc
limit 500

解釋分析:

"Limit  (cost=0.56..1177.67 rows=500 width=11) (actual time=0.046..5.124 rows=500 loops=1)"
"  ->  Index Only Scan Backward using id1_id2_idx on scale_data  (cost=0.56..311588.74 rows=132353 width=11) (actual time=0.045..5.060 rows=500 loops=1)"
"        Index Cond: (id2 = '50'::numeric)"
"        Heap Fetches: 0"
"Planning time: 0.103 ms"
"Execution time: 5.177 ms"

選擇#2——IN 中的更多值——計劃已更改

select id2 from scale_data 
where id2 in (50, 52)
order by id1 desc
limit 500

解釋分析#2:

"Limit  (cost=0.56..857.20 rows=500 width=11) (actual time=0.061..8.703 rows=500 loops=1)"
"  ->  Index Only Scan Backward using id1_id2_idx on scale_data  (cost=0.56..445780.74 rows=260190 width=11) (actual time=0.059..8.648 rows=500 loops=1)"
"        Filter: (id2 = ANY ('{50,52}'::numeric[]))"
"        Rows Removed by Filter: 25030"
"        Heap Fetches: 0"
"Planning time: 0.153 ms"
"Execution time: 8.771 ms"

為什麼計劃不同?為什麼在 #1 它確實顯示為Index condition,但在 #2 Filter和索引掃描單元格的數量。sql#1 不是以與解釋 sql#2 顯示的方式相同的方式遍歷索引嗎?

在實際/生產 DB #2 上工作得慢得多,即使分別通過 2 個鍵搜尋很快

PG 9.5

我不會讓這個困擾你的。FILTER在這種情況下,我相信只意味著索引上的多個條件語句(這是IN數組操作如何被轉換為,afaik)。在任何一個中,它們都在Index Only Scan Backward. 它的工作方式與OR

                                                                       QUERY PLAN                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.56..1219.95 rows=500 width=11) (actual time=0.061..16.159 rows=500 loops=1)
  ->  Index Only Scan Backward using id1_id2_idx on scale_data  (cost=0.56..679161.56 rows=278484 width=11) (actual time=0.060..16.086 rows=500 loops=1)
        Filter: ((id2 = '50'::numeric) OR (id2 = '52'::numeric))
        Rows Removed by Filter: 24673
        Heap Fetches: 25173
Planning time: 0.206 ms
Execution time: 16.235 ms
(7 rows)

test=# EXPlAIN ANALYZE select id2 from scale_data 
where id2 in (50, 52)
order by id1 desc
limit 500
;
                                                                       QUERY PLAN                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.56..1153.17 rows=500 width=11) (actual time=0.072..18.604 rows=500 loops=1)
  ->  Index Only Scan Backward using id1_id2_idx on scale_data  (cost=0.56..645299.05 rows=279930 width=11) (actual time=0.070..18.506 rows=500 loops=1)
        Filter: (id2 = ANY ('{50,52}'::numeric[]))
        Rows Removed by Filter: 24673
        Heap Fetches: 25173
Planning time: 0.187 ms
Execution time: 18.695 ms
(7 rows)

對我來說,這看起來像是SARGable與非 SARGable 謂詞的一個很好的展示。

where id2 in (50)被優化器重寫為一個簡單的相等謂詞,按照 Postgres 的標準,這似乎是 SARGable:即使id2不是複合鍵中的前導列,它仍然只需要字節數組的簡單比較。

where id2 in (50, 52)不能如此重寫,並且生成的謂詞似乎是非 SARGable 的,它要求將復合鍵值的子集提取到可以成為操作數的變數中=ANY()。這似乎需要比索引掃描更大的智能,因此涉及過濾步驟,跨越索引管理器和 SQL 引擎之間的邊界。

引用自:https://dba.stackexchange.com/questions/174582