如何獲取 OR’ed 時間範圍謂詞的索引掃描?
我有
events
欄位表:id user_id time_start time_end ...
並在
(time_start, time_end)
.SELECT user_id FROM events WHERE ((time_start <= '2021-08-24T15:30:00+00:00' AND time_end >= '2021-08-24T15:30:00+00:00') OR (time_start <= '2021-08-24T15:59:00+00:00' AND time_end >= '2021-08-24T15:59:00+00:00')) GROUP BY user_id);
Group (cost=243735.42..243998.32 rows=1103 width=4) (actual time=186.533..188.244 rows=166 loops=1) Group Key: user_id Buffers: shared hit=224848 -> Gather Merge (cost=243735.42..243992.80 rows=2206 width=4) (actual time=186.532..188.199 rows=176 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=224848 -> Sort (cost=242735.39..242738.15 rows=1103 width=4) (actual time=184.121..184.126 rows=59 loops=3) Sort Key: user_id Sort Method: quicksort Memory: 27kB Worker 0: Sort Method: quicksort Memory: 27kB Worker 1: Sort Method: quicksort Memory: 28kB Buffers: shared hit=224848 -> Partial HashAggregate (cost=242668.62..242679.65 rows=1103 width=4) (actual time=184.065..184.085 rows=59 loops=3) Group Key: user_id Buffers: shared hit=224834 -> Parallel Seq Scan on events (cost=0.00..242553.74 rows=45952 width=4) (actual time=104.085..183.994 rows=64 loops=3) Filter: (((time_start <= '2021-08-24 15:30:00+00'::timestamp with time zone) AND (time_end >= '2021-08-24 15:30:00+00'::timestamp with time zone)) OR ((time_start <= '2021-08-24 15:59:00+00'::timestamp with time zone) AND (time_end >= '2021-08-24 15:59:00+00'::timestamp with time zone))) Rows Removed by Filter: 708728 Buffers: shared hit=224834 Planning Time: 0.169 ms Execution Time: 188.294 ms
Postgres 使用
Seq Scan
過濾器:Filter: (((time_start <= '2021-08-24 15:30:00+00'::timestamp with time zone) AND (time_end >= '2021-08-24 15:30:00+00'::timestamp with time zone)) OR ((time_start <= '2021-08-24 15:59:00+00'::timestamp with time zone) AND (time_end >= '2021-08-24 15:59:00+00'::timestamp with time zone)))
但是當我離開一個條件
time_start
並time_end
開始使用索引掃描時。如何更改條件以使 Postgres 使用 Index Scan over Seq Scan?
我不想像這樣使用
UNION
:SELECT user_id FROM events WHERE ( (time_start <= '2021-08-24T15:59:00+00:00' AND time_end >= '2021-08-24T15:59:00+00:00')) GROUP BY user_id) UNION (SELECT user_id FROM events WHERE ( (time_start <= '2021-08-24T15:59:00+00:00' AND time_end >= '2021-08-24T15:59:00+00:00')) GROUP BY user_id
表達索引
包含時間戳範圍的GiST或(甚至更好的)SP-GiST表達式索引應該會產生奇蹟。
CREATE INDEX events_right_idx ON events USING spgist (tsrange(time_start, time_end, '[]'));
使用“範圍包含”運算符重寫您的查詢並匹配索引表達式(完全等同於您的原始表達式):
@>
SELECT user_id FROM events WHERE tsrange(time_start, time_end, '[]') @> timestamp '2021-08-24 15:30:00' OR tsrange(time_start, time_end, '[]') @> timestamp '2021-08-24 15:59:00' GROUP BY user_id;
你會得到一個這樣的查詢計劃:
HashAggregate (cost=9.90..10.00 rows=10 width=4) Group Key: user_id -> Bitmap Heap Scan on events (cost=2.57..9.88 rows=10 width=4) Recheck Cond: ((tsrange(time_start, time_end, ''[]''::text) @> ''2021-08-24 15:30:00''::timestamp without time zone) OR (tsrange(time_start, time_end, ''[]''::text) @> ''2021-08-24 15:59:00''::timestamp without time zone)) -> BitmapOr (cost=2.57..2.57 rows=10 width=0) -> Bitmap Index Scan on events_right_expr_idx (cost=0.00..1.28 rows=5 width=0) Index Cond: (tsrange(time_start, time_end, ''[]''::text) @> ''2021-08-24 15:30:00''::timestamp without time zone) -> Bitmap Index Scan on events_right_expr_idx (cost=0.00..1.28 rows=5 width=0) Index Cond: (tsrange(time_start, time_end, ''[]''::text) @> ''2021-08-24 15:59:00''::timestamp without time zone)
應該快很多。
預設情況下,範圍類型假定包含下限和互斥上限(
tsrange(time_start, time_end)
- 等效於tsrange(time_start, time_end), '[)'
)。由於您使用.
>=
_<=
tsrange(time_start, time_end, '[]')
有關的:
或者,將範圍列儲存在表中
應該快一點,但是,作為普通(不是表達式)索引。
您可以將時間戳範圍列添加到表中,例如:
ALTER TABLE event ADD COLUMN ts_range tsrange GENERATED ALWAYE AS (tsrange(time_start, time_end, '[]')) STORED;
看:
或者,更徹底地,將
time_start
and替換time_end
為 range 列。那麼索引和查詢就簡單一些了:CREATE INDEX events_right_idx ON events USING spgist (ts_range); SELECT user_id FROM events WHERE ts_range @> timestamp '2021-08-24T15:30:00' OR ts_range @> timestamp '2021-08-24T15:59:00' GROUP BY user_id;
但是該
tsrange
列佔用的空間比兩timestamp
列要多。權衡成本和收益。旁白
Postgres 14(目前為測試版)甚至允許覆蓋 SP-GiST 索引。發行說明:
允許 SP-GiST 使用包含的列 (Pavel Borisov)
但我不認為您可以獲得針對特定查詢的僅索引掃描。
如果您出於某種原因不得不使用 B-tree 索引,那麼這個固定
UNION
查詢應該不會太糟糕:SELECT user_id FROM events WHERE '2021-08-24T15:30:00' BETWEEN time_start AND time_end UNION SELECT user_id FROM events WHERE '2021-08-24T15:59:00' BETWEEN time_start AND time_end
值得注意的是,沒有
GROUP BY
。UNION
已經完成了所有工作。並簡化
BETWEEN
(對性能沒有影響)。此外,你似乎有一個狂野的組合
timestamp without time zone
和timestamp with time zone
。並將其命名為“時間”以增加混亂。通常timestamptz
是更好的選擇。看:最後但並非最不重要的一點是,這表明列統計資訊不准確,導致查詢計劃不理想:
-> Parallel Seq Scan on events (cost=0.00..242553.74 **rows=45952** width=4) (實際時間=104.085..183.994**行=64**循環=3)
跑
ANALYZE events;
並重試。您的原始查詢可以使用普通的 B 樹索引。它只是不如建議的 SP-GiST 索引那麼有效。
然後可能會調整您的
autovacuum
和統計設置,以避免將來出現錯誤的統計資訊。看: