Postgresql
Postgres:當查詢超過 150 天的日期範圍時,索引從索引掃描切換到並行序列掃描
我有兩個表——exam_data 和 series_data,每個表有 300M 和 900M 記錄。我有一個連接兩個表的查詢。如果我提取過去 90 天的記錄,則查詢會在一秒鐘內返回。但是一旦超過了一定的日期範圍,比如 180 天,那麼查詢會花費大量的時間,超過 200 秒,有時甚至 10 分鐘。
當我觀察解釋計劃時,突出的一件事是,查詢從索引掃描切換到並行序列掃描。工作人員的數量也從 3 個增加到 10 個。從基礎設施的角度來看,數據庫執行在具有 db.m6g.large 實例類型的 AWS RDS 伺服器上,該實例類型具有 2 個 vCPU,8GB 記憶體。
以下是查詢計劃 - 第一個日期範圍設置為 180 天,第二個設置為 90 天。
有 180 天
Limit (cost=57228.52..700532.83 rows=1000 width=198) -> Nested Loop (cost=57228.52..30312473.74 rows=47031 width=198) Join Filter: (exam.mdp_source_connector_ref_id = reff.mdp_source_connector_ref_id) -> Gather (cost=57228.52..30171353.12 rows=18445 width=174) Workers Planned: 10 -> Parallel Hash Join (cost=56228.52..30168508.62 rows=1844 width=174) Hash Cond: ((series.mdp_source_connector_ref_id = exam.mdp_source_connector_ref_id) AND (series.exam_id = exam.exam_id)) -> Parallel Seq Scan on series_data series (cost=0.00..29035458.82 rows=107681532 width=75) Filter: (filtered_flag = 'N'::text) -> Parallel Hash (cost=56216.33..56216.33 rows=813 width=130) -> Parallel Bitmap Heap Scan on exam_data exam (cost=165.66..56216.33 rows=813 width=130) Recheck Cond: (system_id = '60743ABCDC'::text) Filter: ((filtered_flag = 'N'::text) AND (exam_enddatetime_ts >= (CURRENT_DATE - 180))) -> Bitmap Index Scan on exam_system_id (cost=0.00..165.03 rows=14462 width=0) Index Cond: (system_id = '60743ABCDC'::text) -> Materialize (cost=0.00..17.65 rows=510 width=36) -> Seq Scan on mdp_source_connector_ref reff (cost=0.00..15.10 rows=510 width=36)
有 90 天
Limit (cost=1165.97..686763.18 rows=1000 width=198) -> Nested Loop (cost=1165.97..11702939.19 rows=17068 width=198) Join Filter: (exam.mdp_source_connector_ref_id = reff.mdp_source_connector_ref_id) -> Gather (cost=1165.97..11651721.36 rows=6693 width=174) Workers Planned: 3 -> Nested Loop (cost=165.97..11650052.06 rows=2159 width=174) -> Parallel Bitmap Heap Scan on exam_data exam (cost=165.26..56215.92 rows=295 width=130) Recheck Cond: (system_id = '60743ABCDC'::text) Filter: ((filtered_flag = 'N'::text) AND (exam_enddatetime_ts >= (CURRENT_DATE - 90))) -> Bitmap Index Scan on exam_system_id (cost=0.00..165.03 rows=14462 width=0) Index Cond: (system_id = '60743ABCDC'::text) -> Index Scan using series_data_exam_id on series_data series (cost=0.70..39289.00 rows=1214 width=75) Index Cond: (exam_id = exam.exam_id) Filter: ((filtered_flag = 'N'::text) AND (exam.mdp_source_connector_ref_id = mdp_source_connector_ref_id)) -> Materialize (cost=0.00..17.65 rows=510 width=36) -> Seq Scan on mdp_source_connector_ref reff (cost=0.00..15.10 rows=510 width=36)
以下是使用的查詢
> SELECT exam.system_id, reff.modality, exam.exam_number, > exam.exam_duration, exam.exam_startdate, exam.operator, > exam.study_type, exam.load_time AS lastupdatedtime, exam.exam_id, > exam.exam_startdatetime_ts, exam.exam_enddatetime_ts, exam.referring_physician, > exam.contrast_used, series.series_startdatetime_ts, series.series_enddatetime_ts, > series.series_number, series.series_modality, series.series_description, series.series_duration FROM > device_utilization.exam_data exam > JOIN device_utilization.mdp_source_connector_ref reff ON exam.mdp_source_connector_ref_id = reff.mdp_source_connector_ref_id > JOIN device_utilization.series_data series ON exam.exam_id = series.exam_id AND exam.mdp_source_connector_ref_id = > series.mdp_source_connector_ref_id > WHERE exam.system_id = '63046ABCDA' and exam_enddatetime_ts >=current_date -360 and exam.filtered_flag = 'N'::text AND series.filtered_flag = 'N'::text
計劃的幾個優化是
- 我們在exam_enddatetime_ts 列上包含分區。不確定這是否會對使用的索引類型產生任何影響。
- 我們正在考慮將實例大小從 2 核增加到 4 核。
尋找有關為什麼索引從索引掃描切換到並行 seq 掃描的輸入。還有任何其他輸入如何優化查詢響應時間。
謝謝,普拉薩納
更多問題的更新
EXPLAIN (ANALYZE, BUFFERS) 的輸出包含在下面的查詢中,該查詢提取了 180 天的數據。
Limit (cost=1358.99..784417.12 rows=1000 width=198) (actual time=50.950..194.047 rows=1000 loops=1) Buffers: shared hit=5532 read=1576 I/O Timings: read=677.975 -> Nested Loop (cost=1358.99..14066649.07 rows=17962 width=198) (actual time=50.948..193.912 rows=1000 loops=1) Join Filter: (exam.mdp_source_connector_ref_id = reff.mdp_source_connector_ref_id) Rows Removed by Join Filter: 7995 Buffers: shared hit=5532 read=1576 I/O Timings: read=677.975 -> Gather (cost=1358.99..14012746.10 rows=7044 width=174) (actual time=50.923..191.700 rows=1000 loops=1) Workers Planned: 3 Workers Launched: 3 Buffers: shared hit=5531 read=1576 I/O Timings: read=677.975 -> Nested Loop (cost=358.99..14011041.70 rows=2272 width=174) (actual time=43.362..182.427 rows=251 loops=4) Buffers: shared hit=5531 read=1576 I/O Timings: read=677.975 -> Parallel Bitmap Heap Scan on exam_data exam (cost=165.26..56215.92 rows=286 width=130) (actual time=41.051..149.211 rows=237 loops=4) Recheck Cond: (system_id = '607431D656B'::text) Filter: ((filtered_flag = 'N'::text) AND (exam_enddatetime_ts >= (CURRENT_DATE - 90))) Rows Removed by Filter: 3069 Heap Blocks: exact=371 Buffers: shared hit=102 read=1310 I/O Timings: read=571.800 -> Bitmap Index Scan on exam_system_id (cost=0.00..165.03 rows=14462 width=0) (actual time=4.123..4.124 rows=13223 loops=1) Index Cond: (system_id = '607431D656B'::text) Buffers: shared read=15 I/O Timings: read=3.183 -> Bitmap Heap Scan on series_data series (cost=193.74..48780.83 rows=1227 width=75) (actual time=0.133..0.138 rows=1 loops=947) Recheck Cond: (exam_id = exam.exam_id) Filter: ((filtered_flag = 'N'::text) AND (exam.mdp_source_connector_ref_id = mdp_source_connector_ref_id)) Rows Removed by Filter: 0 Heap Blocks: exact=251 Buffers: shared hit=5429 read=266 I/O Timings: read=106.175 -> Bitmap Index Scan on series_data_exam_id (cost=0.00..193.43 rows=12426 width=0) (actual time=0.037..0.037 rows=1 loops=947) Index Cond: (exam_id = exam.exam_id) Buffers: shared hit=4684 read=54 I/O Timings: read=19.526 -> Materialize (cost=0.00..17.65 rows=510 width=36) (actual time=0.000..0.001 rows=9 loops=1000) Buffers: shared hit=1 -> Seq Scan on mdp_source_connector_ref reff (cost=0.00..15.10 rows=510 width=36) (actual time=0.012..0.014 rows=9 loops=1) Buffers: shared hit=1 Planning: Buffers: shared hit=344 read=99 dirtied=1 I/O Timings: read=49.324 Planning Time: 53.288 ms Execution Time: 194.222 ms
PostgreSQL 切換到順序掃描是因為它認為執行時間會更短。如果不是,則可能是配置問題:
- 也許你告訴 PostgreSQL 索引掃描是昂貴的,保留
random_page_cost
它的預設值 4,這可能適用於旋轉磁碟,但不適用於其他任何東西。- 也許索引掃描很快,因為索引被記憶體了。如果增加
effective_cache_size
,PostgreSQL 將知道索引可能已記憶體,並會降低索引掃描的成本。- 也許您指定的並行度超出了您的硬體容量,因此並行表掃描比 PostgreSQL 認為的要慢。考慮降低
max_parallel_workers_per_gather
。
計劃 10 個工人的唯一方法是將 max_parallel_workers_per_gather 設置為至少 10。但是對於只有 2 個 vCPU 的機器來說,這個設置似乎完全是瘋狂的。最好的設置可能是 0,可能是 1 或 2。我想知道您可能還有什麼其他瘋狂的設置。
僅這一變化就應該讓它轉向一個更好的計劃。無論如何,它可能應該使用更好的計劃,但是通過
EXPLAIN (ANALYZE, BUFFERS)
輸出而不是僅僅EXPLAIN
. (特別是如果先打開 track_io_timing)