Postgresql

有沒有辦法加快 JOIN ON LIKE 的速度?

  • September 7, 2018

我有數以百萬計的東西在一張桌子上。我想根據他們的前綴將組分配給事物。

我嘗試添加所有索引,但規劃器仍然執行嵌套循環左連接。

一個展示案例SQL:

create table things as
select 
   i as id, 
   left(md5(random()::text), 8) as name
from generate_series(1, 100000) as i;

create table match_group_rules as
select
   i as id,
   trunc(random() * 5 + 1) as group_id,
   left(md5(random()::text), 2) as rule
from generate_series(1, 100) as i;

create extension if not exists pg_trgm;
create index match_group_rules_rule on match_group_rules (rule);
create index match_group_rules_rule_pattern on match_group_rules (rule text_pattern_ops);
create index things_name_idx on things (name);
create index things_name_pattern_idx on things (name text_pattern_ops);
create index things_name_gin_trgm_idx on things using gin (name gin_trgm_ops);
create index things_name_gist_trgm_idx on things using gist (name gist_trgm_ops);

explain 
select *
from things t
left join match_group_rules r 
   on t.name like r.rule || '%';

展示案例規劃器輸出:

Nested Loop Left Join  (cost=0.00..176543.25 rows=100000 width=57)
 Join Filter: (t.name ~~ (r.rule || '%'::text))
 ->  Seq Scan on things t  (cost=0.00..1541.00 rows=100000 width=13)
 ->  Materialize  (cost=0.00..2.50 rows=100 width=44)
       ->  Seq Scan on match_group_rules r  (cost=0.00..2.00 rows=100 width=44)

問題:

  1. 是什麼讓計劃者忽略了索引?
  2. 有沒有一種快速的方法來計算事物的組?

最初的問題是左連接。這將其限制為things用作驅動表,但是一旦這樣做,就無法使用索引,因為pg_trgmtext_pattern_ops提供方法來索引匹配給定模式的文本,而不是匹配給定文本的模式。

為了在左連接上使用索引,您必須首先執行一個帶match_group_rules驅動的嵌套循環,記住所有檢索到的行,然後返回並為未找到things的行彌補空行。things沒有內在的原因我可以看到 PostgreSQL 不能執行這個複雜的操作,它只是沒有實現。您可以使用 CTE 和 UNION ALL 自己完成:

with foobar as 
(select t.id as t_id, name, r.id, r.group_id, r.rule
  from things t
  join match_group_rules r 
    on t.name like r.rule || '%'
)
select * from foobar 
   union all
select id, name, NULL, NULL, NULL from things t2 where not exists 
(select 1 from foobar where t2.id=foobar.t_id);

引用自:https://dba.stackexchange.com/questions/217056