Postgresql

具有多列要點的 Postgresql 不良計劃和空結果搜尋

  • October 2, 2018

我遇到了多列索引(整數,geom)的問題,其中 geom 是 PostGIS 點,整數是具有業務意義的點聚合器。我通過 geom 列中的框進行搜尋,將整數與其他表連接起來。當盒子覆蓋一個空白區域時,規劃器會執行索引掃描,只查看輔助列,效率非常低。我花了一些時間嘗試創建一個重現問題的場景(使用更少的數據),並且我能夠做一些聽起來相同的問題,只有整數。此設置將需要大約 2GB 的磁碟。

/*   0<=n1<=10000 */
/*   0<=n2<=1000 || 3000<=n2<=4000 */
/* s1, r1, r2 to increase I/O needs and avoid index only scans */
create table test as select n as n1, (1000*random())::int as n2, generate_series(1,1000) s1 , random() as r1, random() as r2 from generate_series(1,10000) as n;
insert into test select n as n1, ((1000*random())::int) + 3000 as n2, generate_series(1,1000) s1 , random() as r1, random() as r2 from generate_series(1,10000) as n;
create index ind_test on test using gist (n1,n2);
create table ids as select generate_series(1,5) as n1; -- same problem with just one row on this table
analyze ids;
alter table test alter column n1 set statistics 10000;  --excluding poor stats
alter table test alter column n2 set statistics 10000;  --excluding poor stats
analyze test;
explain analyze select * from test join ids using (n1) where n2 = 1001 and n1 = 1; --Q1 Outside n2 range, Index cond n1 AND n2 < 1ms
explain analyze select * from test join ids using (n1) where n2 = 999;  --Q2 Inside n2 range, Index cond n1 AND n2 < 1ms
explain analyze select * from test join ids using (n1) where n2 = 1000;  --Q3 Inside n2 range, Index cond n1 AND n2 < 1 ms
explain analyze select * from test join ids using (n1) where n2 = 1001;  --Q4 Outside n2 range, Index cond n2 > 100 ms
explain analyze select * from test join ids using (n1) where n2 = 1002;  --Q5 Outside n2 range, Index cond n2 > 100 ms

在上述解釋的結果下方:

➤ psql://postgres@[local]:5432/postgres 

#     explain analyze select * from test join ids using (n1) where n2 = 1001 and n1 = 1; --Q1 Outside n2 range, Index cond n1 AND n2 < 1ms
                                                     QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
Nested Loop  (cost=0.42..9.51 rows=1 width=28) (actual time=0.067..0.067 rows=0 loops=1)
  ->  Index Scan using ind_test on test  (cost=0.42..8.44 rows=1 width=28) (actual time=0.067..0.067 rows=0 loops=1)
        Index Cond: ((n1 = 1) AND (n2 = 1001))
  ->  Seq Scan on ids  (cost=0.00..1.06 rows=1 width=4) (never executed)
        Filter: (n1 = 1)
Planning time: 0.404 ms
Execution time: 0.096 ms
(7 rows)

Time: 0.826 ms

➤ psql://postgres@[local]:5432/postgres 

#     explain analyze select * from test join ids using (n1) where n2 = 999;  --Q2 Inside n2 range, Index cond n1 AND n2 < 1ms
                                                     QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
Nested Loop  (cost=0.42..43.29 rows=5 width=28) (actual time=0.098..0.367 rows=7 loops=1)
  ->  Seq Scan on ids  (cost=0.00..1.05 rows=5 width=4) (actual time=0.012..0.014 rows=5 loops=1)
  ->  Index Scan using ind_test on test  (cost=0.42..8.44 rows=1 width=28) (actual time=0.064..0.068 rows=1 loops=5)
        Index Cond: ((n1 = ids.n1) AND (n2 = 999))
Planning time: 0.994 ms
Execution time: 0.407 ms
(6 rows)

Time: 1.713 ms

➤ psql://postgres@[local]:5432/postgres 

#     explain analyze select * from test join ids using (n1) where n2 = 1000;  --Q3 Inside n2 range, Index cond n1 AND n2 < 1 ms
                                                     QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
Nested Loop  (cost=0.42..43.29 rows=3 width=28) (actual time=0.157..0.248 rows=3 loops=1)
  ->  Seq Scan on ids  (cost=0.00..1.05 rows=5 width=4) (actual time=0.010..0.011 rows=5 loops=1)
  ->  Index Scan using ind_test on test  (cost=0.42..8.44 rows=1 width=28) (actual time=0.044..0.046 rows=1 loops=5)
        Index Cond: ((n1 = ids.n1) AND (n2 = 1000))
Planning time: 0.877 ms
Execution time: 0.277 ms
(6 rows)

Time: 1.440 ms

➤ psql://postgres@[local]:5432/postgres 

#     explain analyze select * from test join ids using (n1) where n2 = 1001;  --Q4 Outside n2 range, Index cond n2 > 100 ms
                                                      QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
Nested Loop  (cost=0.42..9.55 rows=1 width=28) (actual time=93.247..93.247 rows=0 loops=1)
  Join Filter: (test.n1 = ids.n1)
  ->  Index Scan using ind_test on test  (cost=0.42..8.44 rows=1 width=28) (actual time=93.246..93.246 rows=0 loops=1)
        Index Cond: (n2 = 1001)
  ->  Seq Scan on ids  (cost=0.00..1.05 rows=5 width=4) (never executed)
Planning time: 0.716 ms
Execution time: 93.280 ms
(7 rows)

Time: 94.242 ms

➤ psql://postgres@[local]:5432/postgres 

#     explain analyze select * from test join ids using (n1) where n2 = 1002;  --Q5 Outside n2 range, Index cond n2 > 100 ms
                                                      QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
Nested Loop  (cost=0.42..9.55 rows=1 width=28) (actual time=86.857..86.857 rows=0 loops=1)
  Join Filter: (test.n1 = ids.n1)
  ->  Index Scan using ind_test on test  (cost=0.42..8.44 rows=1 width=28) (actual time=86.856..86.856 rows=0 loops=1)
        Index Cond: (n2 = 1002)
  ->  Seq Scan on ids  (cost=0.00..1.05 rows=5 width=4) (never executed)
Planning time: 0.750 ms
Execution time: 86.885 ms
(7 rows)

Time: 87.955 ms

# select version();
                                                                  version                                                                   
---------------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 10.5 (Ubuntu 10.5-1.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609, 64-bit
(1 row)

Time: 2.372 ms

從解釋結果中可以看出,當我搜尋不存在的 n2 值時,計劃會根據所使用的索引條件發生變化,從而給出糟糕的計劃。如果索引是 btree,則不會出現這些計劃差異。這似乎與要點有關,由於 PostGIS 功能,我需要使用要點。

我正在使用的解決方法是為有問題的查詢創建一個額外的索引,但由於數據量巨大,它們會導致 I/O 和儲存成本。我嘗試了兩個單列索引,但與多列相比,性能損失太高了。

有什麼硬仗嗎?

假設對低階列的約束與對第一列的約束相同,這聽起來像是 gist 成本估算器中的一個問題。

https://www.postgresql-archive.org/BUG-15408-Postgresql-bad-planning-with-multicolumn-gist-and-search-with-empty-results-td6047700.html

引用自:https://dba.stackexchange.com/questions/218519