Postgresql
具有多列要點的 Postgresql 不良計劃和空結果搜尋
我遇到了多列索引(整數,geom)的問題,其中 geom 是 PostGIS 點,整數是具有業務意義的點聚合器。我通過 geom 列中的框進行搜尋,將整數與其他表連接起來。當盒子覆蓋一個空白區域時,規劃器會執行索引掃描,只查看輔助列,效率非常低。我花了一些時間嘗試創建一個重現問題的場景(使用更少的數據),並且我能夠做一些聽起來相同的問題,只有整數。此設置將需要大約 2GB 的磁碟。
/* 0<=n1<=10000 */ /* 0<=n2<=1000 || 3000<=n2<=4000 */ /* s1, r1, r2 to increase I/O needs and avoid index only scans */ create table test as select n as n1, (1000*random())::int as n2, generate_series(1,1000) s1 , random() as r1, random() as r2 from generate_series(1,10000) as n; insert into test select n as n1, ((1000*random())::int) + 3000 as n2, generate_series(1,1000) s1 , random() as r1, random() as r2 from generate_series(1,10000) as n; create index ind_test on test using gist (n1,n2); create table ids as select generate_series(1,5) as n1; -- same problem with just one row on this table analyze ids; alter table test alter column n1 set statistics 10000; --excluding poor stats alter table test alter column n2 set statistics 10000; --excluding poor stats analyze test; explain analyze select * from test join ids using (n1) where n2 = 1001 and n1 = 1; --Q1 Outside n2 range, Index cond n1 AND n2 < 1ms explain analyze select * from test join ids using (n1) where n2 = 999; --Q2 Inside n2 range, Index cond n1 AND n2 < 1ms explain analyze select * from test join ids using (n1) where n2 = 1000; --Q3 Inside n2 range, Index cond n1 AND n2 < 1 ms explain analyze select * from test join ids using (n1) where n2 = 1001; --Q4 Outside n2 range, Index cond n2 > 100 ms explain analyze select * from test join ids using (n1) where n2 = 1002; --Q5 Outside n2 range, Index cond n2 > 100 ms
在上述解釋的結果下方:
➤ psql://postgres@[local]:5432/postgres # explain analyze select * from test join ids using (n1) where n2 = 1001 and n1 = 1; --Q1 Outside n2 range, Index cond n1 AND n2 < 1ms QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.42..9.51 rows=1 width=28) (actual time=0.067..0.067 rows=0 loops=1) -> Index Scan using ind_test on test (cost=0.42..8.44 rows=1 width=28) (actual time=0.067..0.067 rows=0 loops=1) Index Cond: ((n1 = 1) AND (n2 = 1001)) -> Seq Scan on ids (cost=0.00..1.06 rows=1 width=4) (never executed) Filter: (n1 = 1) Planning time: 0.404 ms Execution time: 0.096 ms (7 rows) Time: 0.826 ms ➤ psql://postgres@[local]:5432/postgres # explain analyze select * from test join ids using (n1) where n2 = 999; --Q2 Inside n2 range, Index cond n1 AND n2 < 1ms QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.42..43.29 rows=5 width=28) (actual time=0.098..0.367 rows=7 loops=1) -> Seq Scan on ids (cost=0.00..1.05 rows=5 width=4) (actual time=0.012..0.014 rows=5 loops=1) -> Index Scan using ind_test on test (cost=0.42..8.44 rows=1 width=28) (actual time=0.064..0.068 rows=1 loops=5) Index Cond: ((n1 = ids.n1) AND (n2 = 999)) Planning time: 0.994 ms Execution time: 0.407 ms (6 rows) Time: 1.713 ms ➤ psql://postgres@[local]:5432/postgres # explain analyze select * from test join ids using (n1) where n2 = 1000; --Q3 Inside n2 range, Index cond n1 AND n2 < 1 ms QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.42..43.29 rows=3 width=28) (actual time=0.157..0.248 rows=3 loops=1) -> Seq Scan on ids (cost=0.00..1.05 rows=5 width=4) (actual time=0.010..0.011 rows=5 loops=1) -> Index Scan using ind_test on test (cost=0.42..8.44 rows=1 width=28) (actual time=0.044..0.046 rows=1 loops=5) Index Cond: ((n1 = ids.n1) AND (n2 = 1000)) Planning time: 0.877 ms Execution time: 0.277 ms (6 rows) Time: 1.440 ms ➤ psql://postgres@[local]:5432/postgres # explain analyze select * from test join ids using (n1) where n2 = 1001; --Q4 Outside n2 range, Index cond n2 > 100 ms QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=0.42..9.55 rows=1 width=28) (actual time=93.247..93.247 rows=0 loops=1) Join Filter: (test.n1 = ids.n1) -> Index Scan using ind_test on test (cost=0.42..8.44 rows=1 width=28) (actual time=93.246..93.246 rows=0 loops=1) Index Cond: (n2 = 1001) -> Seq Scan on ids (cost=0.00..1.05 rows=5 width=4) (never executed) Planning time: 0.716 ms Execution time: 93.280 ms (7 rows) Time: 94.242 ms ➤ psql://postgres@[local]:5432/postgres # explain analyze select * from test join ids using (n1) where n2 = 1002; --Q5 Outside n2 range, Index cond n2 > 100 ms QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=0.42..9.55 rows=1 width=28) (actual time=86.857..86.857 rows=0 loops=1) Join Filter: (test.n1 = ids.n1) -> Index Scan using ind_test on test (cost=0.42..8.44 rows=1 width=28) (actual time=86.856..86.856 rows=0 loops=1) Index Cond: (n2 = 1002) -> Seq Scan on ids (cost=0.00..1.05 rows=5 width=4) (never executed) Planning time: 0.750 ms Execution time: 86.885 ms (7 rows) Time: 87.955 ms # select version(); version --------------------------------------------------------------------------------------------------------------------------------------------- PostgreSQL 10.5 (Ubuntu 10.5-1.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609, 64-bit (1 row) Time: 2.372 ms
從解釋結果中可以看出,當我搜尋不存在的 n2 值時,計劃會根據所使用的索引條件發生變化,從而給出糟糕的計劃。如果索引是 btree,則不會出現這些計劃差異。這似乎與要點有關,由於 PostGIS 功能,我需要使用要點。
我正在使用的解決方法是為有問題的查詢創建一個額外的索引,但由於數據量巨大,它們會導致 I/O 和儲存成本。我嘗試了兩個單列索引,但與多列相比,性能損失太高了。
有什麼硬仗嗎?
假設對低階列的約束與對第一列的約束相同,這聽起來像是 gist 成本估算器中的一個問題。