Postgresql

為什麼 BETWEEN 使用 btree 索引但“元素包含在”範圍運算符 (<@) 中不使用?

  • February 16, 2021

我有一個表,utc timestamptz其中在列上有一個“btree”索引utc

CREATE TABLE foo(utc timestamptz)

CREATE INDEX ix_foo_utc ON foo (utc);

該表包含大約 5 億行數據。

當我過濾utcusingBETWEEN時,查詢計劃器按預期使用索引:

> EXPLAIN ANALYZE
SELECT
  utc
FROM foo
WHERE
   utc BETWEEN '2020-12-01' AND '2031-02-15'
;

QUERY PLAN
Bitmap Heap Scan on foo  (cost=3048368.34..11836322.22 rows=143671392 width=8) (actual time=12447.905..165576.664 rows=150225530 loops=1)
Recheck Cond: ((utc &gt;= '2020-12-01 00:00:00+00'::timestamp with time zone) AND (utc &lt;= '2031-02-15 00:00:00+00'::timestamp with time zone))
Rows Removed by Index Recheck: 543231
Heap Blocks: exact=43537 lossy=1818365
-&gt;  Bitmap Index Scan on ix_foo_utc  (cost=0.00..3012450.49 rows=143671392 width=0) (actual time=12436.236..12436.236 rows=150225530 loops=1)
Index Cond: ((utc &gt;= '2020-12-01 00:00:00+00'::timestamp with time zone) AND (utc &lt;= '2031-02-15 00:00:00+00'::timestamp with time zone))
Planning time: 0.127 ms
Execution time: 172335.517 ms

但是,如果我使用範圍運算符執行相同的查詢,則不使用索引:

> EXPLAIN ANALYZE
SELECT
  utc
FROM quotation.half_hour_data
WHERE
   utc &lt;@ tstzrange('2020-12-01', '2031-02-15')
;

QUERY PLAN
Gather  (cost=1000.00..9552135.30 rows=2556133 width=8) (actual time=0.179..145303.094 rows=150225530 loops=1)
Workers Planned: 2
Workers Launched: 2
-&gt;  Parallel Seq Scan on foo  (cost=0.00..9295522.00 rows=1065055 width=8) (actual time=5.321..117837.452 rows=50075177 loops=3)
"Filter: (utc &lt;@ '[""2020-12-01 00:00:00+00"",""2031-02-15 00:00:00+00"")'::tstzrange)
Rows Removed by Filter: 120333718
Planning time: 0.069 ms
Execution time: 153384.494 ms

我本來希望查詢規劃器意識到這些操作正在執行相同的操作(儘管這&lt;@是右手排他性的並且BETWEEN是包容性的。)

那麼為什麼這些查詢計劃如此不同呢?(忘記問為什麼順序掃描查詢完成得更快??!!)


我的 Postgres 版本:

"PostgreSQL 10.13 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit"

索引只能支持屬於其運算符類的運算符。

SELECT ao.amoplefttype::regtype,
      op.oprname,
      ao.amoprighttype::regtype
FROM pg_opfamily AS of
  JOIN pg_am AS am ON of.opfmethod = am.oid
  JOIN pg_amop AS ao ON of.oid = ao.amopfamily
  JOIN pg_operator AS op ON ao.amopopr = op.oid
WHERE am.amname = 'btree'
 AND ao.amoplefttype = 'timestamptz'::regtype;

      amoplefttype       | oprname |        amoprighttype        
--------------------------+---------+-----------------------------
timestamp with time zone | &lt;       | date
timestamp with time zone | &lt;=      | date
timestamp with time zone | =       | date
timestamp with time zone | &gt;=      | date
timestamp with time zone | &gt;       | date
timestamp with time zone | &lt;       | timestamp without time zone
timestamp with time zone | &lt;=      | timestamp without time zone
timestamp with time zone | =       | timestamp without time zone
timestamp with time zone | &gt;=      | timestamp without time zone
timestamp with time zone | &gt;       | timestamp without time zone
timestamp with time zone | &lt;       | timestamp with time zone
timestamp with time zone | &lt;=      | timestamp with time zone
timestamp with time zone | =       | timestamp with time zone
timestamp with time zone | &gt;=      | timestamp with time zone
timestamp with time zone | &gt;       | timestamp with time zone
(15 rows)

裡面沒有&lt;@操作符,所以 B-tree 索引不能支持這個操作符。

GIN 索引可以支持&lt;@,但不支持右側的常量。

您將不得不重寫查詢並使用BETWEEN.

備註:這不是根本原因,它只是索引在 PostgreSQL 中的工作方式。甚至可以編寫一個優化器支持函式來做到這一點,但是您的案例是如此奇特,以至於 PostgreSQL 不想在優化器上花費時間和開發工作。

引用自:https://dba.stackexchange.com/questions/285497