如何獲取 OR’ed 時間範圍謂詞的索引掃描？

August 27, 2021

我有events欄位表：

id
user_id
time_start
time_end
...

並在(time_start, time_end).

SELECT user_id
FROM events
WHERE ((time_start &lt;= '2021-08-24T15:30:00+00:00' AND time_end &gt;= '2021-08-24T15:30:00+00:00') OR
      (time_start &lt;= '2021-08-24T15:59:00+00:00' AND time_end &gt;= '2021-08-24T15:59:00+00:00'))
GROUP BY user_id);

Group  (cost=243735.42..243998.32 rows=1103 width=4) (actual time=186.533..188.244 rows=166 loops=1)
 Group Key: user_id
 Buffers: shared hit=224848
 -&gt;  Gather Merge  (cost=243735.42..243992.80 rows=2206 width=4) (actual time=186.532..188.199 rows=176 loops=1)
       Workers Planned: 2
       Workers Launched: 2
       Buffers: shared hit=224848
       -&gt;  Sort  (cost=242735.39..242738.15 rows=1103 width=4) (actual time=184.121..184.126 rows=59 loops=3)
             Sort Key: user_id
             Sort Method: quicksort  Memory: 27kB
             Worker 0:  Sort Method: quicksort  Memory: 27kB
             Worker 1:  Sort Method: quicksort  Memory: 28kB
             Buffers: shared hit=224848
             -&gt;  Partial HashAggregate  (cost=242668.62..242679.65 rows=1103 width=4) (actual time=184.065..184.085 rows=59 loops=3)
                   Group Key: user_id
                   Buffers: shared hit=224834
                   -&gt;  Parallel Seq Scan on events  (cost=0.00..242553.74 rows=45952 width=4) (actual time=104.085..183.994 rows=64 loops=3)
                         Filter: (((time_start &lt;= '2021-08-24 15:30:00+00'::timestamp with time zone) AND (time_end &gt;= '2021-08-24 15:30:00+00'::timestamp with time zone)) OR ((time_start &lt;= '2021-08-24 15:59:00+00'::timestamp with time zone) AND (time_end &gt;= '2021-08-24 15:59:00+00'::timestamp with time zone)))
                         Rows Removed by Filter: 708728
                         Buffers: shared hit=224834
Planning Time: 0.169 ms
Execution Time: 188.294 ms

Postgres 使用Seq Scan過濾器：

Filter: (((time_start &lt;= '2021-08-24 15:30:00+00'::timestamp with time zone) AND (time_end &gt;= '2021-08-24 15:30:00+00'::timestamp with time zone)) OR ((time_start &lt;= '2021-08-24 15:59:00+00'::timestamp with time zone) AND (time_end &gt;= '2021-08-24 15:59:00+00'::timestamp with time zone)))

但是當我離開一個條件time_start並time_end開始使用索引掃描時。

如何更改條件以使 Postgres 使用 Index Scan over Seq Scan？

我不想像這樣使用UNION：

SELECT user_id
FROM events
WHERE (
    (time_start &lt;= '2021-08-24T15:59:00+00:00' AND time_end &gt;= '2021-08-24T15:59:00+00:00'))
GROUP BY user_id)
UNION (SELECT user_id
      FROM events
      WHERE (
          (time_start &lt;= '2021-08-24T15:59:00+00:00' AND time_end &gt;= '2021-08-24T15:59:00+00:00'))
  GROUP BY user_id

表達索引
包含時間戳範圍的GiST或（甚至更好的）SP-GiST表達式索引應該會產生奇蹟。
CREATE INDEX events_right_idx ON events USING spgist (tsrange(time_start, time_end, '[]'));
使用“範圍包含”運算符重寫您的查詢並匹配索引表達式（完全等同於您的原始表達式）：@>
SELECT user_id
FROM   events
WHERE  tsrange(time_start, time_end, '[]') @&gt; timestamp '2021-08-24 15:30:00'
   OR tsrange(time_start, time_end, '[]') @&gt; timestamp '2021-08-24 15:59:00'
GROUP  BY user_id;
你會得到一個這樣的查詢計劃：
HashAggregate  (cost=9.90..10.00 rows=10 width=4)
 Group Key: user_id
 -&gt;  Bitmap Heap Scan on events  (cost=2.57..9.88 rows=10 width=4)
       Recheck Cond: ((tsrange(time_start, time_end, ''[]''::text) @&gt; ''2021-08-24 15:30:00''::timestamp without time zone) OR (tsrange(time_start, time_end, ''[]''::text) @&gt; ''2021-08-24 15:59:00''::timestamp without time zone))
       -&gt;  BitmapOr  (cost=2.57..2.57 rows=10 width=0)
             -&gt;  Bitmap Index Scan on events_right_expr_idx  (cost=0.00..1.28 rows=5 width=0)
                   Index Cond: (tsrange(time_start, time_end, ''[]''::text) @&gt; ''2021-08-24 15:30:00''::timestamp without time zone)
             -&gt;  Bitmap Index Scan on events_right_expr_idx  (cost=0.00..1.28 rows=5 width=0)
                   Index Cond: (tsrange(time_start, time_end, ''[]''::text) @&gt; ''2021-08-24 15:59:00''::timestamp without time zone)
應該快很多。
預設情況下，範圍類型假定包含下限和互斥上限（tsrange(time_start, time_end)- 等效於tsrange(time_start, time_end), '[)'）。
由於您使用. >=_<= tsrange(time_start, time_end, '[]')
有關的：
在 PostgreSQL 中執行此營業時間查詢
優化對一系列時間戳（兩列）的查詢
或者，將範圍列儲存在表中
應該快一點，但是，作為普通（不是表達式）索引。
您可以將時間戳範圍列添加到表中，例如：
ALTER TABLE event ADD COLUMN ts_range tsrange GENERATED ALWAYE AS (tsrange(time_start, time_end, '[]')) STORED;
看：
PostgreSQL 中的計算/計算/虛擬/派生列
或者，更徹底地，將time_startand替換time_end為 range 列。那麼索引和查詢就簡單一些了：
CREATE INDEX events_right_idx ON events USING spgist (ts_range);

SELECT user_id
FROM   events
WHERE  ts_range @&gt; timestamp '2021-08-24T15:30:00'
   OR ts_range @&gt; timestamp '2021-08-24T15:59:00'
GROUP  BY user_id;
但是該tsrange列佔用的空間比兩timestamp列要多。權衡成本和收益。
旁白
Postgres 14（目前為測試版）甚至允許覆蓋 SP-GiST 索引。發行說明：
允許 SP-GiST 使用包含的列 (Pavel Borisov)
但我不認為您可以獲得針對特定查詢的僅索引掃描。
如果您出於某種原因不得不使用 B-tree 索引，那麼這個固定UNION查詢應該不會太糟糕：
SELECT user_id
FROM   events
WHERE  '2021-08-24T15:30:00' BETWEEN time_start AND time_end
UNION
SELECT user_id
FROM   events
WHERE  '2021-08-24T15:59:00' BETWEEN time_start AND time_end
值得注意的是，沒有GROUP BY。UNION已經完成了所有工作。
並簡化BETWEEN（對性能沒有影響）。
此外，你似乎有一個狂野的組合timestamp without time zone和timestamp with time zone。並將其命名為“時間”以增加混亂。通常timestamptz是更好的選擇。看：
在 Rails 和 PostgreSQL 中完全忽略時區
最後但並非最不重要的一點是，這表明列統計資訊不准確，導致查詢計劃不理想：
-&gt; Parallel Seq Scan on events (cost=0.00..242553.74 **rows=45952** width=4)
（實際時間=104.085..183.994**行=64**循環=3）
跑
ANALYZE events;
並重試。您的原始查詢可以使用普通的 B 樹索引。它只是不如建議的 SP-GiST 索引那麼有效。
然後可能會調整您的autovacuum和統計設置，以避免將來出現錯誤的統計資訊。看：
Aurora PostgreSQL 數據庫使用比普通 PostgreSQL 更慢的查詢計劃來進行相同的查詢？
防止 PostgreSQL 有時選擇錯誤的查詢計劃

引用自：https://dba.stackexchange.com/questions/298625

如何獲取 OR’ed 時間範圍謂詞的索引掃描？

表達索引

或者，將範圍列儲存在表中

旁白

相關問答

使用索引查詢域名表以獲取匹配的頂級域

大表上的 Postgres 索引掃描需要很長時間

PostgreSQL 12.4 上的計劃時間很慢

PostgreSQL 錯誤地使用主鍵索引進行最小/最大查詢

如何使用索引進行簡單的時間範圍連接？

優化 Postgresql 查詢