PostgreSQL - 多列 B-Tree 索引如何與第一列的 order by 和第二列的 IN 查找一起工作？

January 13, 2022

我創建了這樣的表（類似於http://use-the-index-luke.com/sql/example-schema/postgresql/performance-testing-scalability中的範例）

CREATE TABLE scale_data (
  section NUMERIC NOT NULL,
  id1     NUMERIC NOT NULL, -- unique values simulating ID or Timestamp
  id2     NUMERIC NOT NULL -- a kind of Type
);

填充它：

INSERT INTO scale_data
SELECT sections.sections, sections.sections*10000 + gen.gen
    , CEIL(RANDOM()*100) 
 FROM GENERATE_SERIES(1, 300)     sections,
      GENERATE_SERIES(1, 90000) gen
WHERE gen &lt;= sections * 300;

它生成了 13545000 條記錄。

綜合指數就可以了：

CREATE INDEX id1_id2_idx
 ON public.scale_data
 USING btree
 (id1, id2);

並選擇#1：

select id2 from scale_data 
where id2 in (50)
order by id1 desc
limit 500

解釋分析：

"Limit  (cost=0.56..1177.67 rows=500 width=11) (actual time=0.046..5.124 rows=500 loops=1)"
"  -&gt;  Index Only Scan Backward using id1_id2_idx on scale_data  (cost=0.56..311588.74 rows=132353 width=11) (actual time=0.045..5.060 rows=500 loops=1)"
"        Index Cond: (id2 = '50'::numeric)"
"        Heap Fetches: 0"
"Planning time: 0.103 ms"
"Execution time: 5.177 ms"

選擇#2——IN 中的更多值——計劃已更改

select id2 from scale_data 
where id2 in (50, 52)
order by id1 desc
limit 500

解釋分析#2：

"Limit  (cost=0.56..857.20 rows=500 width=11) (actual time=0.061..8.703 rows=500 loops=1)"
"  -&gt;  Index Only Scan Backward using id1_id2_idx on scale_data  (cost=0.56..445780.74 rows=260190 width=11) (actual time=0.059..8.648 rows=500 loops=1)"
"        Filter: (id2 = ANY ('{50,52}'::numeric[]))"
"        Rows Removed by Filter: 25030"
"        Heap Fetches: 0"
"Planning time: 0.153 ms"
"Execution time: 8.771 ms"

為什麼計劃不同？為什麼在 #1 它確實顯示為Index condition，但在 #2 Filter和索引掃描單元格的數量。sql#1 不是以與解釋 sql#2 顯示的方式相同的方式遍歷索引嗎？

在實際/生產 DB #2 上工作得慢得多，即使分別通過 2 個鍵搜尋很快

PG 9.5

我不會讓這個困擾你的。FILTER在這種情況下，我相信只意味著索引上的多個條件語句（這是IN數組操作如何被轉換為，afaik）。在任何一個中，它們都在Index Only Scan Backward. 它的工作方式與OR

                                                                       QUERY PLAN                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.56..1219.95 rows=500 width=11) (actual time=0.061..16.159 rows=500 loops=1)
  -&gt;  Index Only Scan Backward using id1_id2_idx on scale_data  (cost=0.56..679161.56 rows=278484 width=11) (actual time=0.060..16.086 rows=500 loops=1)
        Filter: ((id2 = '50'::numeric) OR (id2 = '52'::numeric))
        Rows Removed by Filter: 24673
        Heap Fetches: 25173
Planning time: 0.206 ms
Execution time: 16.235 ms
(7 rows)

test=# EXPlAIN ANALYZE select id2 from scale_data 
where id2 in (50, 52)
order by id1 desc
limit 500
;
                                                                       QUERY PLAN                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.56..1153.17 rows=500 width=11) (actual time=0.072..18.604 rows=500 loops=1)
  -&gt;  Index Only Scan Backward using id1_id2_idx on scale_data  (cost=0.56..645299.05 rows=279930 width=11) (actual time=0.070..18.506 rows=500 loops=1)
        Filter: (id2 = ANY ('{50,52}'::numeric[]))
        Rows Removed by Filter: 24673
        Heap Fetches: 25173
Planning time: 0.187 ms
Execution time: 18.695 ms
(7 rows)

對我來說，這看起來像是SARGable與非 SARGable 謂詞的一個很好的展示。
where id2 in (50)被優化器重寫為一個簡單的相等謂詞，按照 Postgres 的標準，這似乎是 SARGable：即使id2不是複合鍵中的前導列，它仍然只需要字節數組的簡單比較。
where id2 in (50, 52)不能如此重寫，並且生成的謂詞似乎是非 SARGable 的，它要求將復合鍵值的子集提取到可以成為操作數的變數中=ANY()。這似乎需要比索引掃描更大的智能，因此涉及過濾步驟，跨越索引管理器和 SQL 引擎之間的邊界。

引用自：https://dba.stackexchange.com/questions/174582

PostgreSQL - 多列 B-Tree 索引如何與第一列的 order by 和第二列的 IN 查找一起工作？

相關問答

為什麼 PostgreSQL 9.5 不使用我最新的 ORDER BY 索引，即使它使用類似的索引就好了？

為什麼在子查詢中 ORDER BY 時沒有使用我的 PostgreSQL 表達式索引？

類似查詢的執行性能問題

使用大 IN 優化 Postgres 查詢

大表中的慢速索引掃描

使用 GIN 索引位串