使用帶有非重音且僅帶有右端萬用字元的 ILIKE

February 15, 2016

我使用 Postgresql 9.4，我有一個名為 foo 的大表。我想搜尋它，但如果搜尋文本很短（例如“v”）或很長（例如“這是一個在表 foo% 上使用 gin 的搜尋範例”），我會得到很長的執行時間。在這種情況下，我的索引被忽略。這是我的搜尋查詢：

EXPLAIN (ANALYZE, TIMING)
SELECT  "foo".* FROM "foo" WHERE "foo"."locale" = 'de'
AND f_unaccent(foo.name) ILIKE f_unaccent('v%')
AND foo.configuration-&gt;'bar' @&gt; '{"is":["a"]}'
LIMIT 100;

這是我的索引：

CREATE INDEX index_foo_on_name_de_gin ON foo USING gin(f_unaccent(name) gin_trgm_ops) WHERE locale = 'de';

為什麼忽略索引並使用seq scan和/或Bitmap heap scan？如何添加其他索引來解決這個問題？

為什麼要復查？

Recheck Cond: ((f_unaccent((name)::text) ~~* 'v%'::text) AND ((locale)::text = 'de'::text))

功能f_unaccent：

CREATE OR REPLACE FUNCTION f_unaccent(text)
        RETURNS text AS
        $func$
        SELECT unaccent('unaccent', $1)
        $func$  LANGUAGE sql IMMUTABLE SET search_path = public, pg_temp;

查詢計劃：

Limit  (cost=24412.85..67568.91 rows=100 width=301) (actual time=21838.473..21838.473 rows=0 loops=1)
  Buffers: shared hit=1 read=749976
  -&gt;  Bitmap Heap Scan on foo  (cost=24412.85..4595502.73 rows=10592 width=301) (actual time=21838.470..21838.470 rows=0 loops=1)
        Recheck Cond: ((f_unaccent((name)::text) ~~* 'v%'::text) AND ((locale)::text = 'de'::text))
        Rows Removed by Index Recheck: 5416739
        Filter: ((configuration -&gt; 'bar'::text) @&gt; '{"is": ["a"]}'::jsonb)
        Rows Removed by Filter: 2196
        Heap Blocks: exact=749172
        Buffers: shared hit=1 read=749976
        -&gt;  Bitmap Index Scan on index_foo_on_name_de_gin  (cost=0.00..24410.20 rows=10591544 width=0) (actual time=641.532..641.532 rows=5418935 loops=1)
              Index Cond: (f_unaccent((name)::text) ~~* 'v%'::text)
              Buffers: shared hit=1 read=804
Planning time: 0.767 ms
Execution time: 21838.549 ms

表定義：

   Column     |            Type             |                          Modifiers                           | Storage  | Stats target | Description 
---------------+-----------------------------+--------------------------------------------------------------+----------+--------------+-------------
id            | integer                     | not null default nextval('foo_id_seq'::regclass)             | plain    |              | 
locale        | character varying           | not null                                                     | extended |              | 
name          | character varying           | not null                                                     | extended |              | 
configuration | jsonb                       | not null default '{}'::jsonb                                 | extended |              | 

"index_foo_on_configuration" gin (configuration)
"index_foo_on_name_de_gin" gin (f_unaccent(name::text) gin_trgm_ops) WHERE locale::text = 'de'::text

如果沒有foo.configuration過濾器，查詢會非常快（1.021 毫秒）。但我需要這個過濾器。這裡沒有過濾器的查詢：

EXPLAIN (ANALYZE, BUFFERS)
SELECT  "foo".* FROM "foo" WHERE "foo"."locale" = 'de'
AND f_unaccent(foo.name) ILIKE f_unaccent('v%')
LIMIT 100;

有變化的結果

更新f_unnacent功能
添加 btree 索引CREATE INDEX index_foo_on_name_de ON foo (f_unaccent(name) text_pattern_ops) WHERE locale = 'de';
在配置中添加了 gin 索引CREATE INDEX index_foo_on_configuration ON foo USING gin(configuration jsonb_path_ops);
刪除舊索引

一個問題：

EXPLAIN (ANALYZE, BUFFERS)
SELECT  "foo".* FROM "foo" WHERE "foo"."locale" = 'de' 
AND f_unaccent(foo.name) ILIKE f_unaccent('v%')
AND foo.configuration-&gt;'bar' @&gt; '{"0":["s"]}' 
LIMIT 100;

A) 查詢計劃：

Limit  (cost=0.00..121248.83 rows=100 width=301) (actual time=16319.267..16319.267 rows=0 loops=1)
  Buffers: shared hit=262079 read=1449294
  -&gt;  Seq Scan on foo  (cost=0.00..12842675.96 rows=10592 width=301) (actual time=16319.261..16319.261 rows=0 loops=1)
        Filter: (((locale)::text = 'de'::text) AND ((configuration -&gt; 'bar'::text) @&gt; '{"is": ["a"]}'::jsonb) AND (f_unaccent((name)::text) ~~* 'v%'::text))
        Rows Removed by Filter: 41227048
        Buffers: shared hit=262079 read=1449294
Planning time: 0.765 ms
Execution time: 16319.313 ms and more!!!

B) 無配置查詢：

EXPLAIN (ANALYZE, BUFFERS)
SELECT  "foo".* FROM "foo" WHERE "foo"."locale" = 'de' 
AND f_unaccent(foo.name) ILIKE f_unaccent('v%') LIMIT 100;

B) 查詢計劃：

Limit  (cost=0.00..119.31 rows=100 width=301) (actual time=0.227..2.912 rows=100 loops=1)
  Buffers: shared read=31
  -&gt;  Seq Scan on foo  (cost=0.00..12636540.72 rows=10591544 width=301) (actual time=0.221..2.864 rows=100 loops=1)
        Filter: (((locale)::text = 'de'::text) AND (f_unaccent((name)::text) ~~* 'v%'::text))
        Rows Removed by Filter: 691
        Buffers: shared read=31
Planning time: 0.501 ms
Execution time: 2.985 ms

C) 無配置無限制查詢：

EXPLAIN (ANALYZE, BUFFERS)
SELECT  "foo".* FROM "foo" WHERE "foo"."locale" = 'de' 
AND f_unaccent(foo.name) ILIKE f_unaccent('v%');

C) 查詢計劃：

Bitmap Heap Scan on foo  (cost=346203.46..4864616.26 rows=10591544 width=301) (actual time=23526.443..30050.008 rows=2196 loops=1)
  Recheck Cond: ((locale)::text = 'de'::text)
  Rows Removed by Index Recheck: 14094842
  Filter: (f_unaccent((name)::text) ~~* 'v%'::text)
  Rows Removed by Filter: 10781095
  Heap Blocks: exact=572873 lossy=847868
  Buffers: shared read=1494015
  -&gt;  Bitmap Index Scan on index_foo_on_name_de  (cost=0.00..343555.58 rows=10592603 width=0) (actual time=1788.454..1788.454 rows=10783291 loops=1)
        Buffers: shared read=73274
Planning time: 0.528 ms
Execution time: 30050.168 ms

1.f_unaccent()
好像您正在使用此處定義的我的函式：
PostgreSQL 是否支持“不區分重音”排序規則？
請注意我剛剛進行的更新。這個更好：
CREATE OR REPLACE FUNCTION f_unaccent(text)
 RETURNS text AS
$func$
SELECT public.unaccent('public.unaccent', $1)  -- schema-qualify function and dictionary
$func$  LANGUAGE sql IMMUTABLE;
那裡有詳細的解釋。
2.重新檢查
為什麼要復查？
“重新檢查條件：”行始終在位EXPLAIN圖索引掃描的輸出中。不用擔心。詳細解釋：
帶有點陣圖索引掃描的查詢計劃中的“重新檢查條件：”行
3.索引和查詢計劃
為什麼索引被忽略
那是一種誤解。您的索引顯然沒有被忽略。如果 Postgres 期望找到足夠多的行，以至於主關係中的某些數據頁必須被訪問不止一次（顯然是這種情況rows=10591544），它會從索引掃描切換到點陣圖索引掃描——然後是“點陣圖堆掃描” " 來獲取實際的元組。細節：
使這個查詢真正昂貴的是多個不幸因素的組合：
索引 (Buffers: shared hit=1 read=804) 和 table ( Buffers: shared hit=1 read=749976) 都沒有被記憶體。如果您立即重複該查詢，它會快得多，因為到那時所有它都被記憶體了。這是可能的最壞情況
搜尋模式f_unaccent('v%')- 或者只是三元索引'v%'的一個非常糟糕的情況。不是很有選擇性 - 但仍然有足夠的選擇性來使用它而不是實際的順序掃描。為此，text_pattern_ops索引會快得多。見下文。
更有選擇性的模式（更長的字元串）也會更快。 3. 你有LIMIT 100，所以 Postgres 開始樂觀地希望能快速找到 100 行。但查詢返回 0 行 ( rows=0)。這意味著 Postgres 必須不成功地遍歷所有候選行。另一個最壞的情況**。你的第二個謂詞在這裡受到指責：
AND foo.configuration-&gt;'bar' @&gt; '{"is":["a"]}'
Postgres 只有非常有限的jsonb列統計資訊。它不知道這種情況會有多選擇性。如果您有很多查詢configuration->'bar'，您可以使用另一個表達式索引大大改善這種情況……
在 JSON 數組中查找元素的索引甚至可能是多列索引。
4.text_pattern_ops
對於左錨模式（“右端萬用字元”），您可以不用三元組索引。但是，如果您在數據庫中使用除“C”語言環境（實際上是“無語言環境”）之外的任何語言環境，則普通的 btree 索引將無法使用。否則，您需要特殊的運算符類來忽略語言環境。喜歡：
CREATE INDEX index_foo_name_pattern_ops_de ON foo (f_unaccent(name) text_pattern_ops)
WHERE locale = 'de';
細節：
在 PostgreSQL 中使用 LIKE、SIMILAR TO 或正則表達式進行模式匹配

引用自：https://dba.stackexchange.com/questions/129202

使用帶有非重音且僅帶有右端萬用字元的 ILIKE

1.`f_unaccent()`

2.重新檢查

3.索引和查詢計劃

4.`text_pattern_ops`

相關問答

使用 GIN 索引位串

我們可以為 JSONB 數據類型的鍵/值創建索引嗎？

當沒有結果並且指定了 LIMIT 時，SELECT 非常慢

優化視圖（和基礎表）以將時間戳平均到小時

postgres：SELECT 字元串的索引，例如 ‘%foo%’；

在數百萬行中查找（罕見）具有空值的行的最快方法？

使用帶有非重音且僅帶有右端萬用字元的 ILIKE

1.f_unaccent()

2.重新檢查

3.索引和查詢計劃

4.text_pattern_ops

相關問答

使用 GIN 索引位串

我們可以為 JSONB 數據類型的鍵/值創建索引嗎？

當沒有結果並且指定了 LIMIT 時，SELECT 非常慢

優化視圖（和基礎表）以將時間戳平均到小時

postgres：SELECT 字元串的索引，例如 ‘%foo%’；

在數百萬行中查找（罕見）具有空值的行的最快方法？

1.`f_unaccent()`

4.`text_pattern_ops`