Postgres 11：升級後未使用正確的索引

October 10, 2021

我們最近在 RDS 上將我們的數據庫從 PostgreSQL 9.6 升級到了 11（在我們為數據庫進行了升級之後vacuum full）analyze。
我們注意到一個頻繁的查詢改變了它的執行計劃，現在它沒有使用正確的索引。
SELECT "posts".*
FROM "posts"
WHERE "posts"."delete_at" IS NULL
 AND "posts"."tenant_id" = 1582
 AND "posts"."approved" = TRUE
 AND (posts.status in (0, 1, 2, 3, 6, 9))
ORDER BY id desc
LIMIT 3000 OFFSET 0
該查詢正在執行大量過濾器，並且有limit+offset用於分頁。這裡最有用的過濾器是tenant_idand status，我們有一個複合索引。
在 Postgres 9.6 中，執行計劃是：
Limit  (cost=36057.09..36064.59 rows=3000 width=1082) (actual time=7.159..8.404 rows=2018 loops=1)
 -&gt;  Sort  (cost=36057.09..36130.91 rows=29529 width=1082) (actual time=7.159..7.664 rows=2018 loops=1)
       Sort Key: id DESC
       Sort Method: quicksort  Memory: 1091kB
       -&gt;  Index Scan using index_posts_on_tenant_id_and_status on posts  (cost=0.56..34204.03 rows=29529 width=1082) (actual time=0.022..5.052 rows=2018 loops=1)
             Index Cond: ((tenant_id = 1582) AND (status = ANY ('{0,1,2,3,6,9}'::integer[])))
             Filter: ((delete_at IS NULL) AND approved)
Planning time: 0.289 ms
Execution time: 8.856 ms
執行時間很快，我們正在使用我們的複合索引。
在 Postgres 11 中：
Limit  (cost=1000.49..30635.60 rows=3000 width=1149) (actual time=7.064..2100.604 rows=2479 loops=1)
 -&gt;  Gather Merge  (cost=1000.49..1138801.15 rows=115181 width=1149) (actual time=7.062..2101.843 rows=2479 loops=1)
       Workers Planned: 4
       Workers Launched: 4
       -&gt;  Parallel Index Scan Backward using posts_pkey on posts  (cost=0.43..1124081.93 rows=28795 width=1149) (actual time=1.831..1679.456 rows=496 loops=5)
             Filter: ((delete_at IS NULL) AND approved AND (tenant_id = 1582) AND (status = ANY ('{0,1,2,3,6,9}'::integer[])))
             Rows Removed by Filter: 1187090
Planning Time: 1.134 ms
Execution Time: 2102.218 ms
我們沒有使用複合索引並且性能很差。
我們做了一些實驗，發現了有趣的東西。
第一個實驗——租戶有2400-2500個文章；如果我們將限制更改為 2400 而不是 3000（預設值）：
Limit  (cost=1000.49..24708.58 rows=2400 width=1149) (actual time=7.716..39.017 rows=2400 loops=1)
 -&gt;  Gather Merge  (cost=1000.49..1138801.15 rows=115181 width=1149) (actual time=7.715..38.848 rows=2400 loops=1)
       Workers Planned: 4
       Workers Launched: 4
       -&gt;  Parallel Index Scan Backward using posts_pkey on posts  (cost=0.43..1124081.93 rows=28795 width=1149) (actual time=2.092..23.554 rows=494 loops=5)
             Filter: ((delete_at IS NULL) AND approved AND (tenant_id = 1582) AND (status = ANY ('{0,1,2,3,6,9}'::integer[])))
             Rows Removed by Filter: 12398
Planning Time: 0.271 ms
Execution Time: 39.158 ms
我們得到了快速的執行時間，但仍然使用了錯誤的索引。
第二個實驗 - 這是一個有趣的實驗 - 我們將順序更改為 becreated_at和 not id（重要的是要注意：我們id DESC對這種查詢有一個索引）：
Limit  (cost=193711.00..193991.02 rows=2400 width=1149) (actual time=5.473..7.277 rows=2400 loops=1)
 -&gt;  Gather Merge  (cost=193711.00..204909.92 rows=95984 width=1149) (actual time=5.472..7.093 rows=2400 loops=1)
       Workers Planned: 2
       Workers Launched: 2
       -&gt;  Sort  (cost=192710.98..192830.96 rows=47992 width=1149) (actual time=2.901..2.948 rows=816 loops=3)
             Sort Key: created_at DESC
             Sort Method: quicksort  Memory: 778kB
             Worker 0:  Sort Method: quicksort  Memory: 273kB
             Worker 1:  Sort Method: quicksort  Memory: 327kB
             -&gt;  Parallel Index Scan using index_posts_on_tenant_id_and_status on posts  (cost=0.56..189776.55 rows=47992 width=1149) (actual time=0.052..2.045 rows=826 loops=3)
                   Index Cond: ((tenant_id = 1582) AND (status = ANY ('{0,1,2,3,6,9}'::integer[])))
                   Filter: ((delete_at IS NULL) AND approved)
Planning Time: 0.279 ms
Execution Time: 7.473 ms
我們得到了快速的執行時間並且使用了正確的複合索引。
我們不知道為什麼會發生這種情況，也不想更改新訂單的程式碼，因為我們想了解這裡發生了什麼。
我們嘗試了：
在文章表上執行analyze- 沒有幫助。
升級到強大的機器 2x 記憶體和 2x CPU - 沒有幫助。
我們刪除了 50% 的標記為delete_at然後再次執行的行analyze- 沒有幫助。
CREATE STATISTICS與dependenciesontenant_id和status執行一起使用analyze- 並獲得完全相同的執行計劃。
在 Postgres 11 上禁用並行性 - 執行時間減慢 2 倍。
任何幫助將不勝感激。

您的查詢是這種形狀：
SELECT ...
FROM atable
WHERE &lt;condition&gt;
ORDER BY &lt;expression&gt;
LIMIT n;
有兩種方法可以處理這樣的查詢
使用索引來加速<condition>和排序結果。
使用索引<expression>按排序順序獲取結果並丟棄所有不滿足的內容，<condition>直到我們到達LIMIT.
現在看來，兩條路徑的成本估算非常接近：36000 與 30500，而後一種情況的估算如此之低，僅僅是因為您有 5 個核心在處理查詢。你肯定增加max_parallel_workers_per_gather了，從2秒的“短”執行時間我什至懷疑你通過設置parallel_workers表上的儲存參數人為地增加了並行度。
此外，PostgreSQL 9.6 沒有並行索引掃描，因此甚至不想選擇第二條路徑，因為一個程序的索引掃描會花費更長的時間。
現在這個策略行不通了：實際上，只有 2479 個匹配行而不是估計的 28795 個，因此索引掃描不能提前停止，而是必須掃描完整的索引，因為LIMIT永遠不會到達。索引掃描很慢，因此該計劃的執行比 PostgreSQL 想像的要差得多。
在某種程度上，v11 中增加的功能為另一種處理查詢的方式打開了大門，結果證明這是一個陷阱。
你可以做些什麼來讓 PostgreSQL 走上正軌：
粗略的方法：更改ORDER BY子句使其與索引表達式不匹配：
ORDER BY id + 0
然後這條壞路被堵死了。但也不能在它會是更好的道路的情況下採取……
更好的方法：降低並行度，要麼max_parallel_workers_per_gather全域降低，要麼parallel_workers將表上的儲存參數設置為較低的值。
那麼這個計劃可能會變得不那麼有吸引力。它可能行不通，因為其他計劃也可以並行化。
也許是最好的方法：嘗試修復錯誤估計的行數，以便 PostgreSQL 知道會發生什麼。
ALTER INDEX posts
  ALTER status SET STATISTICS 1000,
  ALTER tenant_id SET STATISTICS 1000;
ANALYZE posts;
然後看看估計是否更好。
如果這不起作用，請嘗試擴展統計資訊：
CREATE STATISTICS posts_ext_stats (dependencies)
  ON tenant_id, status FROM posts;
ANALYZE posts;

引用自：https://dba.stackexchange.com/questions/258984

Postgres 11：升級後未使用正確的索引

相關問答

PostgreSQL 查詢性能問題

Postgres 為特定使用者執行 SEQ 掃描

執行計劃更喜歡連接而不是排序

SORT BY LIMIT LEFT JOIN 查詢慢/查詢計劃錯誤

Postgres 更喜歡慢速 Seq 掃描而不是快速索引掃描

為什麼 PostgreSQL 中不存在帶有“have”子句的別名？