ILIKE 模式的三元組索引未按預期工作

January 21, 2019

我有一個簡單但緩慢的查詢：

SELECT DISTINCT title  
FROM ja_jobs
WHERE title ILIKE '%RYAN WER%'
AND clientid = 31239
AND time_job &gt; 1457826264
ORDER BY title
LIMIT 10;

解釋分析：

Limit  (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.759..2746.772 rows=1 loops=1)
 -&gt;  Unique  (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.753..2746.763 rows=1 loops=1)
       -&gt;  Sort  (cost=5946.40..5946.41 rows=1 width=19) (actual time=2746.750..2746.754 rows=4 loops=1)
             Sort Key: "title"
             Sort Method: quicksort  Memory: 25kB
             -&gt;  Bitmap Heap Scan on "ja_jobs"  (cost=49.02..5946.39 rows=1 width=19) (actual time=576.275..2746.609 rows=4 loops=1)
                   Recheck Cond: (("clientid" = 31239) AND ("time_job" &gt; 1457826264))
                   Filter: (("title")::"text" ~~* '%RYAN WER%'::"text")
                   Rows Removed by Filter: 791
                   -&gt;  Bitmap Index Scan on "ix_jobs_client_times"  (cost=0.00..49.02 rows=1546 width=0) (actual time=100.870..100.870 rows=795 loops=1)
                         Index Cond: (("clientid" = 31239) AND ("time_job" &gt; 1457826264))
Total runtime: 2746.879 ms

然後，我創建了一個三元組索引：

CREATE INDEX ix_ja_jobs_trgm_gin ON public.ja_jobs USING gin (title gin_trgm_ops);

**添加索引後解釋分析：（**是的，我analyze）

Limit  (cost=389.91..389.91 rows=1 width=20) (actual time=3720.511..3720.511 rows=0 loops=1)
 -&gt;  Unique  (cost=389.91..389.91 rows=1 width=20) (actual time=3720.507..3720.507 rows=0 loops=1)
       -&gt;  Sort  (cost=389.91..389.91 rows=1 width=20) (actual time=3720.505..3720.505 rows=0 loops=1)
             Sort Key: "title"
             Sort Method: quicksort  Memory: 25kB
             -&gt;  Bitmap Heap Scan on "ja_jobs"  (cost=385.88..389.90 rows=1 width=20) (actual time=3720.497..3720.497 rows=0 loops=1)
                   Recheck Cond: (("clientid" = 31239) AND ("time_job" &gt; 1457826264) AND (("title")::"text" ~~ '%RYAN WER%'::"text"))
                   Rows Removed by Index Recheck: 4
                   -&gt;  BitmapAnd  (cost=385.88..385.88 rows=1 width=0) (actual time=3720.469..3720.469 rows=0 loops=1)
                         -&gt;  Bitmap Index Scan on "ix_jobs_client_times"  (cost=0.00..50.00 rows=1644 width=0) (actual time=0.142..0.142 rows=795 loops=1)
                               Index Cond: (("clientid" = 31239) AND ("time_job" &gt; 1457826264))
                         -&gt;  Bitmap Index Scan on "ix_ja_jobs_trgm_gin"  (cost=0.00..335.63 rows=484 width=0) (actual time=3720.213..3720.213 rows=32 loops=1)
                               Index Cond: (("title")::"text" ~~ '%RYAN WER%'::"text")
Total runtime: 3720.653 ms

如您所見，索引不起作用。

表public.ja_jobs：

CREATE TABLE public.ja_jobs (
 id bigint NOT NULL DEFAULT "nextval"('"ja_jobs_id_seq"'::"regclass"),
 refnum character varying(100) NOT NULL DEFAULT ''::character varying,
 clientid bigint NOT NULL DEFAULT 0,
 customerid bigint,
 time_job bigint,
 priority smallint NOT NULL DEFAULT 0,
 status character varying(255) NOT NULL DEFAULT 'active'::"bpchar",
 title character varying(100) NOT NULL DEFAULT ''::character varying,

 -- some other irrelevant columns
)

上的索引public.ja_jobs：

Indexes:
   "ja_jobs_pkey" PRIMARY KEY, "btree" ("id")
   "ix_bill_customer_jobs" "btree" ("customerid", "bill_customer")
   "ix_clientid_jobs" "btree" ("clientid")
   "ix_customerid_job" "btree" ("customerid")
   "ix_ja_jobs_clientid_modified_date_created_date" "btree" ("clientid", "modified_date", "created_date")
   "ix_ja_jobs_gsdi_pk" "btree" (("id"::"text"))
   "ix_ja_jobs_trgm_gin" "gin" ("title" "gin_trgm_ops")
   "ix_job_customer_recent_jobs_lookaside_bill_customer" "btree" ("bill_customer", "modified_date")
   "ix_job_customer_recent_jobs_lookaside_clientid" "btree" ("clientid", "modified_date")
   "ix_job_customer_recent_jobs_lookaside_customer" "btree" ("customerid", "modified_date")
   "ix_jobs_charges_and_parts_sort" "btree" (("charges_count" + "parts_count"))
   "ix_jobs_client_times" "btree" ("clientid", "time_job", "time_arrival")
   "ix_jobs_fts_description_en" "gin" ("full_text_universal_cast"("description"))
   "ix_jobs_fts_full_address_en" "gin" ((((("full_text_universal_cast"("address"::"text") || "full_text_universal_cast"("suburb"::"text")) || "full_text_universal_cast"("city"::"text")) || "full_text_universal_cast"("stpr"::"text")) || "full_text_universal_cast"("postc
ode"::"text")))
   "ix_jobs_fts_job_number_en" "gin" ("full_text_universal_cast"("job_number"::"text"))
   "ix_jobs_fts_refnum_en" "gin" ("full_text_universal_cast"("refnum"::"text"))
   "ix_jobs_fts_title_en" "gin" ("full_text_universal_cast"("title"::"text"))
   "ix_jobs_full_address_street_first" "btree" (((((COALESCE("address"::character varying, ''::character varying)::"text" || COALESCE(' '::"text" || "suburb"::"text", ''::"text")) || COALESCE(' '::"text" || "city"::"text", ''::"text")) || COALESCE(' '::"text" || "postc
ode"::"text", ''::"text")) || COALESCE(' '::"text" || "stpr"::"text", ''::"text")))
   "ix_jobs_paying_customers" "btree" ((COALESCE("bill_customer", "customerid")))
   "ix_jobs_status_label_ids" "btree" ("status_label_id")
   "ix_jobs_top_by_client" "btree" ("id", "clientid")
   "ix_mobiuser_jobs" "btree" ("accepted_mobile_user")
   "ix_recurrenceid_jobs" "btree" ("recurrenceid")
   "ix_timejob_jobs" "btree" ("time_job")
   "ja_jobs_client_type" "btree" ("clientid", "jobtype")
   "ja_jobs_the_geom_idx" "gist" ("the_geom")

題：

我可以做些什麼來改進查詢？為什麼 trigram 索引沒有按預期工作？

**更新：**重新執行解釋分析緩衝區

Limit  (cost=199669.37..199669.39 rows=10 width=20) (actual time=31523.690..31523.691 rows=1 loops=1)
 Buffers: shared hit=26947 read=101574 dirtied=438
 -&gt;  Sort  (cost=199669.37..199669.40 rows=11 width=20) (actual time=31523.686..31523.686 rows=1 loops=1)
       Sort Key: "title"
       Sort Method: quicksort  Memory: 25kB
       Buffers: shared hit=26947 read=101574 dirtied=438
       -&gt;  Bitmap Heap Scan on "ja_jobs"  (cost=4850.60..199669.18 rows=11 width=20) (actual time=11714.504..31523.640 rows=1 loops=1)
             Recheck Cond: (("clientid" = 2565) AND ("time_job" &gt; 1382496599))
             Filter: (("title")::"text" ~~* '%Hislop%'::"text")
             Rows Removed by Filter: 207654
             Buffers: shared hit=26942 read=101574 dirtied=438
             -&gt;  Bitmap Index Scan on "ix_jobs_client_times"  (cost=0.00..4850.60 rows=155054 width=0) (actual time=11670.956..11670.956 rows=215142 loops=1)
                   Index Cond: (("clientid" = 2565) AND ("time_job" &gt; 1382496599))
                   Buffers: shared hit=121 read=5772
Total runtime: 31524.874 ms

刪除DISTINCT和左側後%：

explain (analyze, buffers)
SELECT title  
FROM ja_jobs
WHERE title ILIKE 'Hislop 13035%'
AND clientid = 2565
AND time_job &gt; 1382496599
ORDER BY title
LIMIT 10;


Limit  (cost=2275.53..2275.55 rows=9 width=20) (actual time=3492.479..3492.483 rows=1 loops=1)
 Buffers: shared hit=4940 read=448
 I/O Timings: read=83.285
 -&gt;  Sort  (cost=2275.53..2275.55 rows=9 width=20) (actual time=3492.475..3492.477 rows=1 loops=1)
       Sort Key: "title"
       Sort Method: quicksort  Memory: 25kB
       Buffers: shared hit=4940 read=448
       I/O Timings: read=83.285
       -&gt;  Bitmap Heap Scan on "ja_jobs"  (cost=391.62..2275.38 rows=9 width=20) (actual time=3492.460..3492.462 rows=1 loops=1)
             Recheck Cond: (("title")::"text" ~~* 'Hislop Street Clinic 2513035%'::"text")
             Filter: (("time_job" &gt; 1382496599) AND ("clientid" = 2565))
             Buffers: shared hit=4940 read=448
             I/O Timings: read=83.285
             -&gt;  Bitmap Index Scan on "ix_jobs_trgm_gin"  (cost=0.00..391.62 rows=482 width=0) (actual time=3492.427..3492.427 rows=1 loops=1)
                   Index Cond: (("title")::"text" ~~* 'Hislop 13035%'::"text")
                   Buffers: shared hit=4939 read=448
                   I/O Timings: read=83.285
Total runtime: 3492.531 ms

如您所見，查詢正在訪問新索引，但速度較慢。

然後我刪除ORDER BY了，但查詢仍然很慢。

另外，我嘗試使用LIKE（with 更快），但LIKE區分大小寫，因此我沒有返回任何行。不能用。

你有很多索引。我懷疑你需要所有這些。檢查它們是否都在使用中。手冊中的說明，檢查索引使用一章。
如果您的系統配置為收集統計資訊，那麼研究起來會特別有啟發性：
SELECT * FROM pg_stat_user_indexes
這些統計資訊也顯示在 pgAdmin 中。
有些索引特別奇怪，例如："ix_ja_jobs_gsdi_pk" "btree" (("id"::"text"))- 為什麼會有人將bigintidtext用於索引？
無用的索引不會（很多）損害讀取性能，但它們是寫入性能和一般維護的負擔。
查詢的主要困難是估計各種謂詞的選擇性。這對於bigint列clientid和來說相對簡單time_job，但對於模式匹配 ( )來說很難。title ILIKE 'Hislop 13035%'
在您的情況下（更新 3），Postgres 估計會找到 482 行匹配該模式，但結果只是一行：
“ix_jobs_trgm_gin”上的點陣圖索引掃描（成本=0.00..391.62行=482寬度=0）（實際時間=3492.427..3492.427行=1循環=1）
調整查詢取決於全貌：基數、數據分佈、硬體、負載、並發性、查詢頻率、優先級……
它可能有助於增加所涉及列的統計目標：
ALTER ja_jobs
  ALTER clientid SET STATISTICS 1000
, ALTER time_job SET STATISTICS 1000
, ALTER title    SET STATISTICS 1000;
然後：ANALYZE ja_jobs;但不要期望太多。細節：
檢查 PostgreSQL 中的統計目標
估計自由浮動LIKE模式的選擇性很困難。左錨更容易 - 您可以自由混合這兩種完全不同的情況：（ILIKE '%RYAN WER%'使用前導萬用字元）比ILIKE 'Hislop 13035%'. 概述：
在 PostgreSQL 中使用 LIKE、SIMILAR TO 或正則表達式進行模式匹配
最新版本有所改進，但最大的改進來自 Postgres 9.6（目前為 beta）及其新版本的 pg_trgm 模組。考慮這裡的發行說明。有關的：
隨著搜尋字元串變長，Trigram 搜尋變得更慢
有多種其他方法可以提高性能，具體取決於您問題中沒有的所有資訊。可能是查詢中的多列索引、部分索引或 CTE。
所有關於性能調整的一般建議也適用：
為讀取性能配置 PostgreSQL
http://wiki.postgresql.org/wiki/Performance_Optimization
您已經刪除了昂貴的DISTINCT. 如果您不需要ORDER BY title，它也可能會完全更改查詢計劃以刪除它：刪除這兩者後，Postgres 可以自由選擇前 10 個匹配項並忽略其餘匹配項。否則，必須找到並考慮所有匹配項。可能要貴得多*。***嘗試：
SELECT title
FROM   ja_jobs
WHERE  title ILIKE 'Hislop 13035%'
AND    clientid = 2565
AND    time_job &gt; 1382496599
LIMIT  10;  -- no ORDER BY
如果您實際上只處理**左錨定LIKE**模式（尾隨萬用字元，如'Hislop 13035%'，但不是：） '%RYAN WER%'，那麼您可以使用非常快的varchar_pattern_ops索引。詳細解釋：
運算符“~<~”使用 varchar_pattern_ops 索引，而普通的 ORDER BY 子句不使用？
所以：
CREATE INDEX ix_ja_jobs_special_idx ON public.ja_jobs
(clientid, title varchar_pattern_ops, time_job);
按此順序索引列。先平等，後範圍。解釋：
多列索引和性能
您可以擴展此解決方案以覆蓋ILIKE功能元素
CREATE INDEX ix_ja_jobs_special_idx ON public.ja_jobs
(clientid, **lower(title)** varchar_pattern_ops, time_job);
並調整您的查詢：
SELECT title
FROM   ja_jobs
WHERE  **lower(title) LIKE lower('Hislop 13035%')**
AND    clientid = 2565
AND    time_job &gt; 1382496599
LIMIT  10;
未使用數據類型 citext 的列上的索引

引用自：https://dba.stackexchange.com/questions/138288

ILIKE 模式的三元組索引未按預期工作

題：

相關問答

使用大 IN 優化 Postgres 查詢

大表中的慢速索引掃描

使用 GIN 索引位串

我們可以為 JSONB 數據類型的鍵/值創建索引嗎？

PostgreSQL - 多列 B-Tree 索引如何與第一列的 order by 和第二列的 IN 查找一起工作？

Postgres對可空索引的慢查詢