Postgresql

Postgres:當查詢超過 150 天的日期範圍時,索引從索引掃描切換到並行序列掃描

  • February 25, 2022

我有兩個表——exam_data 和 series_data,每個表有 300M 和 900M 記錄。我有一個連接兩個表的查詢。如果我提取過去 90 天的記錄,則查詢會在一秒鐘內返回。但是一旦超過了一定的日期範圍,比如 180 天,那麼查詢會花費大量的時間,超過 200 秒,有時甚至 10 分鐘。

當我觀察解釋計劃時,突出的一件事是,查詢從索引掃描切換到並行序列掃描。工作人員的數量也從 3 個增加到 10 個。從基礎設施的角度來看,數據庫執行在具有 db.m6g.large 實例類型的 AWS RDS 伺服器上,該實例類型具有 2 個 vCPU,8GB 記憶體。

以下是查詢計劃 - 第一個日期範圍設置為 180 天,第二個設置為 90 天。

有 180 天

Limit  (cost=57228.52..700532.83 rows=1000 width=198)
 ->  Nested Loop  (cost=57228.52..30312473.74 rows=47031 width=198)
       Join Filter: (exam.mdp_source_connector_ref_id = reff.mdp_source_connector_ref_id)
       ->  Gather  (cost=57228.52..30171353.12 rows=18445 width=174)
             Workers Planned: 10
             ->  Parallel Hash Join  (cost=56228.52..30168508.62 rows=1844 width=174)
                   Hash Cond: ((series.mdp_source_connector_ref_id = exam.mdp_source_connector_ref_id) AND (series.exam_id = exam.exam_id))
                   ->  Parallel Seq Scan on series_data series  (cost=0.00..29035458.82 rows=107681532 width=75)
                         Filter: (filtered_flag = 'N'::text)
                   ->  Parallel Hash  (cost=56216.33..56216.33 rows=813 width=130)
                         ->  Parallel Bitmap Heap Scan on exam_data exam  (cost=165.66..56216.33 rows=813 width=130)
                               Recheck Cond: (system_id = '60743ABCDC'::text)
                               Filter: ((filtered_flag = 'N'::text) AND (exam_enddatetime_ts >= (CURRENT_DATE - 180)))
                               ->  Bitmap Index Scan on exam_system_id  (cost=0.00..165.03 rows=14462 width=0)
                                     Index Cond: (system_id = '60743ABCDC'::text)
       ->  Materialize  (cost=0.00..17.65 rows=510 width=36)
             ->  Seq Scan on mdp_source_connector_ref reff  (cost=0.00..15.10 rows=510 width=36)
         
         

有 90 天

Limit  (cost=1165.97..686763.18 rows=1000 width=198)
 ->  Nested Loop  (cost=1165.97..11702939.19 rows=17068 width=198)
       Join Filter: (exam.mdp_source_connector_ref_id = reff.mdp_source_connector_ref_id)
       ->  Gather  (cost=1165.97..11651721.36 rows=6693 width=174)
             Workers Planned: 3
             ->  Nested Loop  (cost=165.97..11650052.06 rows=2159 width=174)
                   ->  Parallel Bitmap Heap Scan on exam_data exam  (cost=165.26..56215.92 rows=295 width=130)
                         Recheck Cond: (system_id = '60743ABCDC'::text)
                         Filter: ((filtered_flag = 'N'::text) AND (exam_enddatetime_ts >= (CURRENT_DATE - 90)))
                         ->  Bitmap Index Scan on exam_system_id  (cost=0.00..165.03 rows=14462 width=0)
                               Index Cond: (system_id = '60743ABCDC'::text)
                   ->  Index Scan using series_data_exam_id on series_data series  (cost=0.70..39289.00 rows=1214 width=75)
                         Index Cond: (exam_id = exam.exam_id)
                         Filter: ((filtered_flag = 'N'::text) AND (exam.mdp_source_connector_ref_id = mdp_source_connector_ref_id))
       ->  Materialize  (cost=0.00..17.65 rows=510 width=36)
             ->  Seq Scan on mdp_source_connector_ref reff  (cost=0.00..15.10 rows=510 width=36)

以下是使用的查詢

> SELECT exam.system_id, reff.modality, exam.exam_number,
> exam.exam_duration, exam.exam_startdate,     exam.operator,
> exam.study_type, exam.load_time AS lastupdatedtime, exam.exam_id,
>     exam.exam_startdatetime_ts, exam.exam_enddatetime_ts, exam.referring_physician,
>     exam.contrast_used, series.series_startdatetime_ts, series.series_enddatetime_ts,
>     series.series_number, series.series_modality, series.series_description, series.series_duration    FROM
> device_utilization.exam_data exam
>      JOIN device_utilization.mdp_source_connector_ref reff ON exam.mdp_source_connector_ref_id = reff.mdp_source_connector_ref_id
>      JOIN device_utilization.series_data series ON exam.exam_id = series.exam_id AND exam.mdp_source_connector_ref_id =
> series.mdp_source_connector_ref_id
>      WHERE exam.system_id = '63046ABCDA'  and exam_enddatetime_ts >=current_date -360     and exam.filtered_flag = 'N'::text AND series.filtered_flag = 'N'::text

計劃的幾個優化是

  1. 我們在exam_enddatetime_ts 列上包含分區。不確定這是否會對使用的索引類型產生任何影響。
  2. 我們正在考慮將實例大小從 2 核增加到 4 核。

尋找有關為什麼索引從索引掃描切換到並行 seq 掃描的輸入。還有任何其他輸入如何優化查詢響應時間。

謝謝,普拉薩納

更多問題的更新

EXPLAIN (ANALYZE, BUFFERS) 的輸出包含在下面的查詢中,該查詢提取了 180 天的數據。

Limit  (cost=1358.99..784417.12 rows=1000 width=198) (actual time=50.950..194.047 rows=1000 loops=1)
 Buffers: shared hit=5532 read=1576
 I/O Timings: read=677.975
 ->  Nested Loop  (cost=1358.99..14066649.07 rows=17962 width=198) (actual time=50.948..193.912 rows=1000 loops=1)
       Join Filter: (exam.mdp_source_connector_ref_id = reff.mdp_source_connector_ref_id)
       Rows Removed by Join Filter: 7995
       Buffers: shared hit=5532 read=1576
       I/O Timings: read=677.975
       ->  Gather  (cost=1358.99..14012746.10 rows=7044 width=174) (actual time=50.923..191.700 rows=1000 loops=1)
             Workers Planned: 3
             Workers Launched: 3
             Buffers: shared hit=5531 read=1576
             I/O Timings: read=677.975
             ->  Nested Loop  (cost=358.99..14011041.70 rows=2272 width=174) (actual time=43.362..182.427 rows=251 loops=4)
                   Buffers: shared hit=5531 read=1576
                   I/O Timings: read=677.975
                   ->  Parallel Bitmap Heap Scan on exam_data exam  (cost=165.26..56215.92 rows=286 width=130) (actual time=41.051..149.211 rows=237 loops=4)
                         Recheck Cond: (system_id = '607431D656B'::text)
                         Filter: ((filtered_flag = 'N'::text) AND (exam_enddatetime_ts >= (CURRENT_DATE - 90)))
                         Rows Removed by Filter: 3069
                         Heap Blocks: exact=371
                         Buffers: shared hit=102 read=1310
                         I/O Timings: read=571.800
                         ->  Bitmap Index Scan on exam_system_id  (cost=0.00..165.03 rows=14462 width=0) (actual time=4.123..4.124 rows=13223 loops=1)
                               Index Cond: (system_id = '607431D656B'::text)
                               Buffers: shared read=15
                               I/O Timings: read=3.183
                   ->  Bitmap Heap Scan on series_data series  (cost=193.74..48780.83 rows=1227 width=75) (actual time=0.133..0.138 rows=1 loops=947)
                         Recheck Cond: (exam_id = exam.exam_id)
                         Filter: ((filtered_flag = 'N'::text) AND (exam.mdp_source_connector_ref_id = mdp_source_connector_ref_id))
                         Rows Removed by Filter: 0
                         Heap Blocks: exact=251
                         Buffers: shared hit=5429 read=266
                         I/O Timings: read=106.175
                         ->  Bitmap Index Scan on series_data_exam_id  (cost=0.00..193.43 rows=12426 width=0) (actual time=0.037..0.037 rows=1 loops=947)
                               Index Cond: (exam_id = exam.exam_id)
                               Buffers: shared hit=4684 read=54
                               I/O Timings: read=19.526
       ->  Materialize  (cost=0.00..17.65 rows=510 width=36) (actual time=0.000..0.001 rows=9 loops=1000)
             Buffers: shared hit=1
             ->  Seq Scan on mdp_source_connector_ref reff  (cost=0.00..15.10 rows=510 width=36) (actual time=0.012..0.014 rows=9 loops=1)
                   Buffers: shared hit=1
Planning:
 Buffers: shared hit=344 read=99 dirtied=1
 I/O Timings: read=49.324
Planning Time: 53.288 ms
Execution Time: 194.222 ms

PostgreSQL 切換到順序掃描是因為它認為執行時間會更短。如果不是,則可能是配置問題:

  • 也許你告訴 PostgreSQL 索引掃描是昂貴的,保留random_page_cost它的預設值 4,這可能適用於旋轉磁碟,但不適用於其他任何東西。
  • 也許索引掃描很快,因為索引被記憶體了。如果增加effective_cache_size,PostgreSQL 將知道索引可能已記憶體,並會降低索引掃描的成本。
  • 也許您指定的並行度超出了您的硬體容量,因此並行表掃描比 PostgreSQL 認為的要慢。考慮降低max_parallel_workers_per_gather

計劃 10 個工人的唯一方法是將 max_parallel_workers_per_gather 設置為至少 10。但是對於只有 2 個 vCPU 的機器來說,這個設置似乎完全是瘋狂的。最好的設置可能是 0,可能是 1 或 2。我想知道您可能還有什麼其他瘋狂的設置。

僅這一變化就應該讓它轉向一個更好的計劃。無論如何,它可能應該使用更好的計劃,但是通過EXPLAIN (ANALYZE, BUFFERS)輸出而不是僅僅EXPLAIN. (特別是如果先打開 track_io_timing)

引用自:https://dba.stackexchange.com/questions/307853