Postgresql
有沒有辦法加快 JOIN ON LIKE 的速度?
我有數以百萬計的東西在一張桌子上。我想根據他們的前綴將組分配給事物。
我嘗試添加所有索引,但規劃器仍然執行嵌套循環左連接。
一個展示案例SQL:
create table things as select i as id, left(md5(random()::text), 8) as name from generate_series(1, 100000) as i; create table match_group_rules as select i as id, trunc(random() * 5 + 1) as group_id, left(md5(random()::text), 2) as rule from generate_series(1, 100) as i; create extension if not exists pg_trgm; create index match_group_rules_rule on match_group_rules (rule); create index match_group_rules_rule_pattern on match_group_rules (rule text_pattern_ops); create index things_name_idx on things (name); create index things_name_pattern_idx on things (name text_pattern_ops); create index things_name_gin_trgm_idx on things using gin (name gin_trgm_ops); create index things_name_gist_trgm_idx on things using gist (name gist_trgm_ops); explain select * from things t left join match_group_rules r on t.name like r.rule || '%';
展示案例規劃器輸出:
Nested Loop Left Join (cost=0.00..176543.25 rows=100000 width=57) Join Filter: (t.name ~~ (r.rule || '%'::text)) -> Seq Scan on things t (cost=0.00..1541.00 rows=100000 width=13) -> Materialize (cost=0.00..2.50 rows=100 width=44) -> Seq Scan on match_group_rules r (cost=0.00..2.00 rows=100 width=44)
問題:
- 是什麼讓計劃者忽略了索引?
- 有沒有一種快速的方法來計算事物的組?
最初的問題是左連接。這將其限制為
things
用作驅動表,但是一旦這樣做,就無法使用索引,因為pg_trgm
並text_pattern_ops
提供方法來索引匹配給定模式的文本,而不是匹配給定文本的模式。為了在左連接上使用索引,您必須首先執行一個帶
match_group_rules
驅動的嵌套循環,記住所有檢索到的行,然後返回並為未找到things
的行彌補空行。things
沒有內在的原因我可以看到 PostgreSQL 不能執行這個複雜的操作,它只是沒有實現。您可以使用 CTE 和 UNION ALL 自己完成:with foobar as (select t.id as t_id, name, r.id, r.group_id, r.rule from things t join match_group_rules r on t.name like r.rule || '%' ) select * from foobar union all select id, name, NULL, NULL, NULL from things t2 where not exists (select 1 from foobar where t2.id=foobar.t_id);