Postgresql

Postgres Planner 偶爾不使用 GIN 索引

  • September 9, 2021

平台:Heroku;PostgreSQL 版本 13

我有一個大約 46GB 的表,並且在名稱和電子郵件列上有兩個 GIN 索引我有一個搜尋查詢,假設使用定義的索引在名稱和電子郵件列中搜尋,並且大多數情況下它可以正常工作,因為解釋分析顯示但有些它只是停止使用這些索引並使用完全不同的索引,甚至與查詢無關。這是查詢

EXPLAIN (ANALYZE TRUE, buffers TRUE, timing TRUE)
SELECT
   "sometbl".*
FROM
   "sometbl"
WHERE
   "sometbl"."deleted_at" IS NULL
   AND(email ILIKE '%sometxt%' OR name ILIKE '%sometxt%')

ORDER BY
   created_at DESC
LIMIT 10 OFFSET 0;

解釋分析資訊(壞計劃)

Limit  (cost=0.09..7598.95 rows=10 width=1428) (actual time=300.670..67670.269 rows=7 loops=1)
 Buffers: shared hit=9947970 read=2567029 written=9
 I/O Timings: read=10637.696 write=0.152
 ->  Index Scan Backward using index_sometbl_on_created_at on sometbl  (cost=0.09..2429357.54 rows=3197 width=1428) (actual time=300.669..67670.257 rows=7 loops=1)
"        Filter: ((deleted_at IS NULL) AND (((email)::text ~~* '%sometxt%'::text) OR ((name)::text ~~* '%sometxt%'::text)))"
       Rows Removed by Filter: 16184624
       Buffers: shared hit=9947970 read=2567029 written=9
       I/O Timings: read=10637.696 write=0.152
Planning:
 Buffers: shared hit=594
Planning Time: 1.808 ms
Execution Time: 67670.327 ms

index_mytbl_on_created_at 是 created_at 上的索引,我真的不確定為什麼規劃者選擇這個索引。

雖然應該是這樣(好計劃)

Limit  (cost=7020.60..7020.61 rows=10 width=1428) (actual time=575.046..575.053 rows=7 loops=1)
 Buffers: shared hit=24090 read=2040
 I/O Timings: read=139.303
 ->  Sort  (cost=7020.60..7022.20 rows=3197 width=1428) (actual time=575.045..575.049 rows=7 loops=1)
       Sort Key: created_at DESC
       Sort Method: quicksort  Memory: 29kB
       Buffers: shared hit=24090 read=2040
       I/O Timings: read=139.303
       ->  Bitmap Heap Scan on sometbl  (cost=745.12..7006.78 rows=3197 width=1428) (actual time=574.917..575.009 rows=7 loops=1)
"              Recheck Cond: (((email)::text ~~* '%sometxt%'::text) OR ((name)::text ~~* '%sometxt%'::text))"
             Filter: (deleted_at IS NULL)
             Heap Blocks: exact=7
             Buffers: shared hit=24090 read=2040
             I/O Timings: read=139.303
             ->  BitmapOr  (cost=745.12..745.12 rows=3201 width=0) (actual time=574.864..574.866 rows=0 loops=1)
                   Buffers: shared hit=24087 read=2036
                   I/O Timings: read=139.277
                   ->  Bitmap Index Scan on trgm_idx_sometbl_email  (cost=0.00..398.41 rows=1606 width=0) (actual time=424.932..424.933 rows=7 loops=1)
"                          Index Cond: ((email)::text ~~* '%sometxt%'::text)"
                         Buffers: shared hit=21569 read=765
                         I/O Timings: read=75.132
                   ->  Bitmap Index Scan on trgm_idx_sometbl_name  (cost=0.00..346.39 rows=1595 width=0) (actual time=149.929..149.929 rows=0 loops=1)
"                          Index Cond: ((name)::text ~~* '%sometxt%'::text)"
                         Buffers: shared hit=2518 read=1271
                         I/O Timings: read=64.145
Planning:
 Buffers: shared hit=2
Planning Time: 0.584 ms
Execution Time: 575.177 ms

我嘗試過的幾件事。

  1. 如果我在查詢中使用一列,則一切正常,並且計劃者選擇正確的索引。
EXPLAIN ANALYSE SELECT
   "mytbl".*
FROM
   "mytbl"
WHERE
   "mytbl"."deleted_at" IS NULL
   and email ILIKE '%somestr%'
ORDER BY
   created_at DESC
LIMIT 10 OFFSET 0; 

解釋分析

Limit  (cost=4067.45..4067.45 rows=10 width=1439) (actual time=801.053..801.055 rows=1 loops=1)
 ->  Sort  (cost=4067.45..4068.27 rows=1636 width=1439) (actual time=801.052..801.053 rows=1 loops=1)
       Sort Key: created_at DESC
       Sort Method: quicksort  Memory: 26kB
       ->  Bitmap Heap Scan on mytbl (cost=834.54..4060.38 rows=1636 width=1439) (actual time=801.039..801.041 rows=1 loops=1)
"              Recheck Cond: ((email)::text ~~* '%somestr%'::text)"
             Filter: (deleted_at IS NULL)
             Heap Blocks: exact=1
             ->  Bitmap Index Scan on trgm_idx_mytbl_email  (cost=0.00..834.46 rows=1637 width=0) (actual time=801.015..801.015 rows=1 loops=1)
"                    Index Cond: ((email)::text ~~* '%somestr%'::text)"
Planning Time: 0.497 ms
Execution Time: 801.158 ms

工作得很好,如果我用名字改變它,那麼它也可以工作。

  1. 如果我刪除
ORDER BY
   created_at DESC

部分並執行上述查詢,然後規劃器再次選擇正確的索引。

Limit  (cost=1371.22..1390.80 rows=10 width=1439) (actual time=789.841..789.845 rows=1 loops=1)
 Buffers: shared hit=47417 read=1942
 I/O Timings: read=8.258
 ->  Bitmap Heap Scan on mytbl (cost=1371.22..7752.83 rows=3260 width=1439) (actual time=789.840..789.843 rows=1 loops=1)
"        Recheck Cond: (((email)::text ~~* '%somestr%'::text) OR ((name)::text ~~* '%somestr%'::text))"
       Filter: (deleted_at IS NULL)
       Heap Blocks: exact=1
       Buffers: shared hit=47417 read=1942
       I/O Timings: read=8.258
       ->  BitmapOr  (cost=1371.22..1371.22 rows=3263 width=0) (actual time=789.809..789.810 rows=0 loops=1)
             Buffers: shared hit=47416 read=1942
             I/O Timings: read=8.258
             ->  Bitmap Index Scan on trgm_idx_mytbl_email  (cost=0.00..834.46 rows=1637 width=0) (actual time=704.337..704.337 rows=1 loops=1)
"                    Index Cond: ((email)::text ~~* '%somestr%'::text)"
                   Buffers: shared hit=45677
             ->  Bitmap Index Scan on trgm_idx_mytbl_name  (cost=0.00..536.44 rows=1626 width=0) (actual time=85.469..85.469 rows=0 loops=1)
"                    Index Cond: ((name)::text ~~* '%somestr%'::text)"
                   Buffers: shared hit=1739 read=1942
                   I/O Timings: read=8.258
Planning:
 Buffers: shared hit=2
Planning Time: 0.587 ms
Execution Time: 789.925 ms
  1. 我已經在有問題的桌子上手動執行了分析,並且幾個小時前剛剛完成了自動真空。

所以,預先警告:我沒有解決你的問題,這只是解釋發生了什麼。

這取決於您的 WHERE 的選擇性。如果計劃者認為很少有行會滿足 WHERE,那麼使用 GIN 索引,獲取所有匹配的行,然後執行排序是有意義的。但是如果規劃器認為很多行將通過過濾器,它會看到 ORDER BY created_at DESC LIMIT 10,而是決定按順序掃描索引,直到看到滿足 WHERE 的 10 行。

不幸的是,估計行通過 WHERE 的可能性可能非常錯誤。您可以嘗試增加相關列的統計資訊目標並重做分析,但另一種方法是重組您的查詢以始終強制獲取所有行 - 儘管公平警告如果您曾經將此查詢與一個字元串一起使用匹配很多行,它會使其非常慢。它會是這樣的:

select * from (
 SELECT
   "mytbl".*
 FROM
   "mytbl"
 WHERE
   "mytbl"."deleted_at" IS NULL
   AND(email ILIKE '%somestr%' OR name ILIKE '%somestr')
 -- OFFSET 0 is an optimization fence to make sure the planner
 -- doesn't push the ORDER BY into this level.
 OFFSET 0
)
ORDER BY
   created_at DESC
LIMIT 10 OFFSET 0;

最後,PG 支持一次考慮多於一列的擴展統計資訊 - 在這種情況下,電子郵件不是隨機分佈在計劃者需要注意的 created_at 值中的事實,並且預設情況下不會。不幸的是,恐怕我自己從來沒有使用過這個功能,也不想給你不正確的指示——我不確定它是否可以在這種情況下使用,其中一個列是按順序排列的,而不是過濾-所以我只是將您指向文件

引用自:https://dba.stackexchange.com/questions/299242