當涉及 OR 時，Postgres 選擇過濾器而不是索引 cond

June 28, 2019

我有一張表，每天添加大約 2000 萬條記錄，我試圖通過它進行分頁以讓人們訪問其中的所有數據，但查詢時間必須是“體面的”（在我的情況下定義為小於 30 秒/查詢）。

為此，我過去一直在使用鍵集分頁，但是對於這個特定的查詢和表，我的查詢時間確實很慢，這似乎是因為查詢計劃者決定過濾掉一天的數據，然後對其執行過濾器而不是索引條件掃描。

該表如下所示：

create table mmsi_positions_archive
(
   id bigserial not null
       constraint mmsi_positions_archive_pkey
           primary key,
   position_id uuid,
   previous_id uuid,
   mmsi bigint not null,
   collection_type varchar not null,
   accuracy numeric,
   maneuver numeric,
   rate_of_turn numeric,
   status integer,
   speed numeric,
   course numeric,
   heading numeric,
   position geometry(Point,4326),
   timestamp timestamp with time zone not null,
   updated_at timestamp with time zone default now(),
   created_at timestamp with time zone default now()
);

create index ix_mmsi_positions_archive_mmsi
   on mmsi_positions_archive (mmsi);

create index ix_mmsi_positions_archive_position_id
   on mmsi_positions_archive (position_id);

create index ix_mmsi_positions_archive_timestamp_mmsi_id_asc
   on mmsi_positions_archive (timestamp, id);

我試圖分頁的列是timestampand id，為了提供幫助，我還更新了表統計目標timestamp並將其設置為最大值 10 000 並分析了表。

該表也按季度分區，但目前我只對單個分區的數據進行操作。

快速查詢

SELECT id
FROM mmsi_positions_archive
WHERE timestamp &gt; '2019-03-10 00:00:00.000000+00:00'
 AND timestamp &lt;= '2019-03-11 00:00:00+00:00'
ORDER BY timestamp, id
LIMIT 100

這給出了以下查詢計劃（注意mmsi_positions_archive表本身是空的，所有數據都在*_p2019_q1表中）：

Limit  (cost=0.60..5.39 rows=100 width=16) (actual time=0.053..0.089 rows=100 loops=1)
 -&gt;  Merge Append  (cost=0.60..773572.19 rows=16149157 width=16) (actual time=0.053..0.082 rows=100 loops=1)
"        Sort Key: mmsi_positions_archive.""timestamp"", mmsi_positions_archive.id"
       -&gt;  Sort  (cost=0.01..0.02 rows=1 width=16) (actual time=0.009..0.009 rows=0 loops=1)
"              Sort Key: mmsi_positions_archive.""timestamp"", mmsi_positions_archive.id"
             Sort Method: quicksort  Memory: 25kB
             -&gt;  Seq Scan on mmsi_positions_archive  (cost=0.00..0.00 rows=1 width=16) (actual time=0.001..0.001 rows=0 loops=1)
                   Filter: (("timestamp" &gt; '2019-03-10 00:00:00+00'::timestamp with time zone) AND ("timestamp" &lt;= '2019-03-11 00:00:00+00'::timestamp with time zone))
       -&gt;  Index Only Scan using mmsi_positions_archive_p2019q1_timestamp_id_index on mmsi_positions_archive_p2019q1  (cost=0.58..571707.70 rows=16149156 width=16) (actual time=0.043..0.067 rows=100 loops=1)
             Index Cond: (("timestamp" &gt; '2019-03-10 00:00:00+00'::timestamp with time zone) AND ("timestamp" &lt;= '2019-03-11 00:00:00+00'::timestamp with time zone))
             Heap Fetches: 0
Planning time: 67.023 ms
Execution time: 0.128 ms

鍵集分頁查詢（慢）

SELECT id
FROM mmsi_positions_archive
WHERE (timestamp &gt; '2019-03-10 00:00:00.000000+00:00'
          OR (timestamp = '2019-03-10 00:00:00.000000+00:00' AND id &gt;  1032749689))
 AND timestamp &lt;= '2019-03-11 00:00:00+00:00'
ORDER BY timestamp, id
LIMIT 100

這給出了解釋，最終執行速度要慢得多：

Limit  (cost=0.60..25.08 rows=100 width=16) (actual time=332918.152..332918.192 rows=100 loops=1)
 -&gt;  Merge Append  (cost=0.60..41278140.09 rows=168591751 width=16) (actual time=332918.152..332918.189 rows=100 loops=1)
"        Sort Key: mmsi_positions_archive.""timestamp"", mmsi_positions_archive.id"
       -&gt;  Sort  (cost=0.01..0.02 rows=1 width=16) (actual time=0.004..0.004 rows=0 loops=1)
"              Sort Key: mmsi_positions_archive.""timestamp"", mmsi_positions_archive.id"
             Sort Method: quicksort  Memory: 25kB
             -&gt;  Seq Scan on mmsi_positions_archive  (cost=0.00..0.00 rows=1 width=16) (actual time=0.001..0.001 rows=0 loops=1)
                   Filter: (("timestamp" &lt;= '2019-03-11 00:00:00+00'::timestamp with time zone) AND (("timestamp" &gt; '2019-03-10 00:00:00+00'::timestamp with time zone) OR (("timestamp" = '2019-03-10 00:00:00+00'::timestamp with time zone) AND (id &gt; 1032749689))))
       -&gt;  Index Only Scan using mmsi_positions_archive_p2019q1_timestamp_id_index on mmsi_positions_archive_p2019q1  (cost=0.58..39170743.18 rows=168591750 width=16) (actual time=332918.147..332918.181 rows=100 loops=1)
             Index Cond: ("timestamp" &lt;= '2019-03-11 00:00:00+00'::timestamp with time zone)
             Filter: (("timestamp" &gt; '2019-03-10 00:00:00+00'::timestamp with time zone) OR (("timestamp" = '2019-03-10 00:00:00+00'::timestamp with time zone) AND (id &gt; 1032749689)))
             Rows Removed by Filter: 953622052
             Heap Fetches: 0
Planning time: 0.778 ms
Execution time: 332918.226 ms

據我了解，這最終會變慢，因為索引條件Index Cond: ("timestamp" <= '2019-03-11 00:00:00+00'::timestamp with time zone最終會對大約 2000 萬*70 行索引數據進行 seq 掃描，然後將它們過濾掉。

解決方法

我做了一些測試，發現問題出OR在聲明中；如果我不這樣做，他們都會給我一個快速的計劃OR。因此，我將其切換並進行了UNION查詢以獲取我想要的數據：

SELECT id
FROM (
        SELECT *
        FROM (
                 SELECT id        AS id,
                        timestamp AS timestamp
                 FROM mmsi_positions_archive
                 WHERE timestamp = '2019-03-10 00:00:00.000000+00:00'
                   AND id &gt; 1032749689
                 ORDER BY timestamp, id
                 LIMIT 100
             ) keyset
        UNION
        SELECT *
        FROM (
                 SELECT id        AS id,
                        timestamp AS timestamp
                 FROM mmsi_positions_archive
                 WHERE timestamp &gt; '2019-03-10 00:00:00.000000+00:00'
                   AND timestamp &lt;= '2019-03-11 00:00:00+00:00'
                 ORDER BY timestamp, id
                 LIMIT 100
             ) all_after
    ) archive_ids
ORDER BY timestamp, id
LIMIT 100

產生快速查詢和以下查詢計劃：

Limit  (cost=34.27..34.52 rows=100 width=16) (actual time=0.232..0.242 rows=100 loops=1)
 -&gt;  Sort  (cost=34.27..34.77 rows=200 width=16) (actual time=0.231..0.238 rows=100 loops=1)
"        Sort Key: mmsi_positions_archive.""timestamp"", mmsi_positions_archive.id"
       Sort Method: quicksort  Memory: 34kB
       -&gt;  HashAggregate  (cost=22.63..24.63 rows=200 width=16) (actual time=0.151..0.167 rows=200 loops=1)
"              Group Key: mmsi_positions_archive.id, mmsi_positions_archive.""timestamp"""
             -&gt;  Append  (cost=0.71..21.63 rows=200 width=16) (actual time=0.028..0.111 rows=200 loops=1)
                   -&gt;  Limit  (cost=0.71..12.24 rows=100 width=16) (actual time=0.028..0.049 rows=100 loops=1)
                         -&gt;  Merge Append  (cost=0.71..17.43 rows=145 width=16) (actual time=0.027..0.046 rows=100 loops=1)
                               Sort Key: mmsi_positions_archive.id
                               -&gt;  Index Scan using mmsi_positions_archive_pkey on mmsi_positions_archive  (cost=0.12..8.14 rows=1 width=16) (actual time=0.010..0.010 rows=0 loops=1)
                                     Index Cond: (id &gt; 1032749689)
                                     Filter: ("timestamp" = '2019-03-10 00:00:00+00'::timestamp with time zone)
                               -&gt;  Index Only Scan using mmsi_positions_archive_p2019q1_timestamp_id_index on mmsi_positions_archive_p2019q1  (cost=0.58..7.46 rows=144 width=16) (actual time=0.017..0.028 rows=100 loops=1)
                                     Index Cond: (("timestamp" = '2019-03-10 00:00:00+00'::timestamp with time zone) AND (id &gt; 1032749689))
                                     Heap Fetches: 0
                   -&gt;  Limit  (cost=0.60..5.39 rows=100 width=16) (actual time=0.012..0.049 rows=100 loops=1)
                         -&gt;  Merge Append  (cost=0.60..773572.19 rows=16149157 width=16) (actual time=0.011..0.044 rows=100 loops=1)
"                                Sort Key: mmsi_positions_archive_1.""timestamp"", mmsi_positions_archive_1.id"
                               -&gt;  Sort  (cost=0.01..0.02 rows=1 width=16) (actual time=0.005..0.005 rows=0 loops=1)
"                                      Sort Key: mmsi_positions_archive_1.""timestamp"", mmsi_positions_archive_1.id"
                                     Sort Method: quicksort  Memory: 25kB
                                     -&gt;  Seq Scan on mmsi_positions_archive mmsi_positions_archive_1  (cost=0.00..0.00 rows=1 width=16) (actual time=0.001..0.001 rows=0 loops=1)
                                           Filter: (("timestamp" &gt; '2019-03-10 00:00:00+00'::timestamp with time zone) AND ("timestamp" &lt;= '2019-03-11 00:00:00+00'::timestamp with time zone))
                               -&gt;  Index Only Scan using mmsi_positions_archive_p2019q1_timestamp_id_index on mmsi_positions_archive_p2019q1 mmsi_positions_archive_p2019q1_1  (cost=0.58..571707.70 rows=16149156 width=16) (actual time=0.006..0.031 rows=100 loops=1)
                                     Index Cond: (("timestamp" &gt; '2019-03-10 00:00:00+00'::timestamp with time zone) AND ("timestamp" &lt;= '2019-03-11 00:00:00+00'::timestamp with time zone))
                                     Heap Fetches: 0
Planning time: 1.059 ms
Execution time: 0.312 ms

雖然我可以重寫我的查詢以使用該UNION方法，但我確實想知道是否有某種方法可以更好地幫助 Postgres 通過使用OR?

我也在 AWS Aurora Postgres 9.6 上執行它。我知道我們落後了幾個主要版本，我正計劃盡快升級，但目前我只需要讓這件事正常工作。:)

幸運的是，這在 PostgreSQL 中非常簡單，因為它支持可以使用索引的“行值”（或複合值）之間的比較。
所以你可以寫：
WHERE (timestamp, id) &gt; ('2019-03-10 00:00:00+00:00', 1032749689)
 AND timestamp &lt;= '2019-03-11 00:00:00+00:00'
ORDER BY timestamp, id
LIMIT 100
此類行值的比較是按字典順序進行的，這正是您想要的。
這是該功能的文件連結。

引用自：https://dba.stackexchange.com/questions/241591

當涉及 OR 時，Postgres 選擇過濾器而不是索引 cond

相關問答

Postgresql 9.6 大型管理工作的最佳設置（min_wal 和 max_wal）

在 Postgres 9.6 中，為什麼 GIN 索引不用於帶有 text/int 數組的 JSONB 列？

使用計數/排序的 postgres 查詢性能緩慢

PostgreSQL中的漸進式子表掃描？

PostgreSQL 在使用 NOT-IN 時不會生成僅索引計劃，但在將 IN 與多列覆蓋索引一起使用時會生成

Postgresql 無法使用我的覆蓋索引並退回到更慢的點陣圖掃描