PostgreSQL RDS:當大表/索引不在記憶體中時,性能比預期慢
我有一個相當大的表,43GB 和 14GB 索引,其中包含時間序列成本數據。我正在按日期查詢並彙總金額。當此數據不在記憶體中(作業系統或 Postgres)時,對於在特定時間段內擁有數百萬行數據的使用者,查詢可能需要長達 50 秒的時間才能執行,但通常會被其他過濾器過濾到數千行應用。據我所知,索引已根據查詢模式進行了很好的優化。我已經執行 EXPLAIN (ANALYZE, BUFFERS) 並且可以清楚地看到從磁碟讀取速度變慢了。
我的工作量有點奇怪,因為我正在做大批量寫入和大批量刪除,所以我相信 VACUUM 工作做了很多工作。我根本沒有對此進行調整,但實際上我認為根據我使用的索引這不會有幫助。我正在跟踪“活動進口”,然後刪除舊的非活動進口。活動導入 ID 包含在查詢和索引中,因此我不應該掃描以前導入的死元組。
我還嘗試將 random_page_cost 調低到 1.1,它將使用索引掃描而不是點陣圖堆掃描,但性能最終大致相同。
我在 RDS db.r6g.2xlarge(8vCPU,64GB 的 RAM)上執行 Postgres 13.4,並預配了 IOPS(11,000 - 我通常最多讀取 + 寫入 9,000)。
我不希望非記憶體查詢這麼慢。我在這裡有錯誤的期望嗎?我已經將 shared_buffers 調整為 40%,並意識到根據表和索引的大小,我可能應該跳到 128G 版本,這將是我的下一步。
Table "public.service_costs" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description -------------------------+--------------------------------+-----------+----------+--------------------------+----------+--------------+------------- id | uuid | | not null | public.gen_random_uuid() | plain | | date | timestamp without time zone | | not null | | plain | | cost_type | character varying | | not null | | extended | | service | character varying | | not null | | extended | | amount | numeric | | not null | | main | | cost_category | character varying | | | | extended | | cost_sub_category | character varying | | | | extended | | service_costs_import_id | bigint | | not null | | plain | | Indexes: "service_costs_pkey" PRIMARY KEY, btree (id) "indx_srvc_csts_on_cst_type__dt__srvc__cst_ctgry__cst_sb_ctgry" btree (service_costs_import_id, cost_type, date, service, cost_category, cost_sub_category) Access method: heap
…
schema_name | relname | size | table_size --------------------+-----------------------------------------------------------------+------------+------------- public | service_costs | 43 GB | 46511259648 public | indx_srvc_csts_on_cst_type__dt__srvc__cst_ctgry__cst_sb_ctgry | 14 GB | 15080833024
…
EXPLAIN (ANALYZE,BUFFERS) SELECT SUM ( service_costs . amount ) FROM service_costs WHERE service_costs . cost_type IN (...) AND service_costs . service_costs_import_id IN (2066, 2067, 1267, 1269, 1268, 1270, 2068, 1273, 4996, 5047) AND service_costs.service = '....' AND "service_costs"."date" BETWEEN '2021-10-01' AND '2021-10-31 23:59:59.999999';
…
Aggregate (cost=1390974.93..1390974.94 rows=1 width=32) (actual time=17067.830..17067.831 rows=1 loops=1) Buffers: shared hit=6854 read=80448 dirtied=754 I/O Timings: read=16236.006 -> Bitmap Heap Scan on service_costs (cost=351286.12..1390173.71 rows=320487 width=5) (actual time=4827.074..16996.060 rows=323382 loops=1) Recheck Cond: ((service_costs_import_id = ANY ('{2066,2067,1267,1269,1268,1270,2068,1273,4996,5047}'::bigint[])) AND ((cost_type)::text = ANY ('{...}'::text[])) AND (date >= '2021-10-01 00:00:00'::timestamp without time zone) AND (date <= '2021-10-31 23:59:59.999999'::timestamp without time zone) AND ((service)::text = '...'::text)) Heap Blocks: exact=70327 Buffers: shared hit=6854 read=80448 dirtied=754 I/O Timings: read=16236.006 -> Bitmap Index Scan on indx_srvc_csts_on_cst_type__dt__srvc__cst_ctgry__cst_sb_ctgry (cost=0.00..351206.00 rows=320487 width=0) (actual time=4815.759..4815.759 rows=323382 loops=1) Index Cond: ((service_costs_import_id = ANY ('{2066,2067,1267,1269,1268,1270,2068,1273,4996,5047}'::bigint[])) AND ((cost_type)::text = ANY ('{...}'::text[])) AND (date >= '2021-10-01 00:00:00'::timestamp without time zone) AND (date <= '2021-10-31 23:59:59.999999'::timestamp without time zone) AND ((service)::text = '...'::text)) Buffers: shared hit=159 read=16816 I/O Timings: read=4575.310 Planning Time: 0.159 ms Execution Time: 17067.865 ms (14 rows)
…
Aggregate (cost=1390974.93..1390974.94 rows=1 width=32) (actual time=403.002..403.003 rows=1 loops=1) Buffers: shared hit=87302 -> Bitmap Heap Scan on service_costs (cost=351286.12..1390173.71 rows=320487 width=5) (actual time=206.128..338.491 rows=323382 loops=1) Recheck Cond: ((service_costs_import_id = ANY ('{2066,2067,1267,1269,1268,1270,2068,1273,4996,5047}'::bigint[])) AND ((cost_type)::text = ANY ('{....}'::text[])) AND (date >= '2021-10-01 00:00:00'::timestamp without time zone) AND (date <= '2021-10-31 23:59:59.999999'::timestamp without time zone) AND ((service)::text = '...'::text)) Heap Blocks: exact=70327 Buffers: shared hit=87302 -> Bitmap Index Scan on indx_srvc_csts_on_cst_type__dt__srvc__cst_ctgry__cst_sb_ctgry (cost=0.00..351206.00 rows=320487 width=0) (actual time=195.167..195.167 rows=323382 loops=1) Index Cond: ((service_costs_import_id = ANY ('{...}'::bigint[])) AND ((cost_type)::text = ANY ('{...}'::text[])) AND (date >= '2021-10-01 00:00:00'::timestamp without time zone) AND (date <= '2021-10-31 23:59:59.999999'::timestamp without time zone) AND ((service)::text = '....'::text)) Buffers: shared hit=16975 Planning Time: 0.168 ms Execution Time: 403.042 ms
(11 行)
Index Cond: ((service_costs_import_id = ANY ('{2066,2067,1267,1269,1268,1270,2068,1273,4996,5047}'::bigint[])) AND ((cost_type)::text = ANY ('{...}'::text[])) AND (date >= '2021-10-01 00:00:00'::timestamp without time zone) AND (date <= '2021-10-31 23:59:59.999999'::timestamp without time zone) AND ((service)::text = '...'::text))
看起來您的索引是 on
(service_costs_import_id, cost_type, date, service)
,儘管它可能具有比查詢中未使用的列更多的列。如果這些是按順序排列的列,那麼該索引的問題是“服務”無法有效使用,因為它遵循“日期”列,該列用於範圍而不是相等。所以“服務”只能用於過濾掉行,不能跳轉到索引中的特定位置。如果您顛倒索引中最後兩列的順序,那麼它將能夠有效地利用所有列。更好的是,如果您顛倒順序然後將“數量”添加到末尾,您可以獲得僅索引掃描。但有效地做到這一點可能需要更高水平的吸塵。
在配置 11,000 IOPS 的情況下,讀取 80000 個塊不應超過 16000 毫秒。但這忽略了延遲,除了 AWS 文件中最模糊的術語外,我沒有發現任何描述。如果您需要在發送下一個請求之前等待一個塊返回,您將不可能獲得最高的可用 IOPS。您可以提高 Effective_io_concurrency 以查看是否可以通過一次處理多個未完成的請求來改善情況。(它不會改進點陣圖索引掃描部分,只是點陣圖堆掃描。)當然,這種分析取決於一次只有一個查詢。如果多個查詢必須共享吞吐量,則必須對其進行劃分。