Postgresql

Postgres 不使用 group by 的表達式索引

  • December 10, 2019

我正在對具有匹配索引的單個列進行簡單的分組,它工作正常:

create index on contacts (home_city);
select home_city, count(*) 
from contacts 
group by home_city 
limit 50;

在 800 萬行表上花費 150 毫秒。這已經足夠快了。解釋分析輸出:

"Limit  (cost=1000.46..9256.24 rows=50 width=17) (actual time=14.201..76.630 rows=50 loops=1)"
"  ->  Finalize GroupAggregate  (cost=1000.46..207395.17 rows=1250 width=17) (actual time=14.200..62.196 rows=50 loops=1)"
"        Group Key: home_city"
"        ->  Gather Merge  (cost=1000.46..207370.17 rows=2500 width=17) (actual time=14.100..76.557 rows=97 loops=1)"
"              Workers Planned: 2"
"              Workers Launched: 2"
"              ->  Partial GroupAggregate  (cost=0.43..206081.59 rows=1250 width=17) (actual time=0.897..56.260 rows=41 loops=3)"
"                    Group Key: home_city"
"                    ->  Parallel Index Only Scan using contacts_6_home_city_idx on contacts_6  (cost=0.43..188620.82 rows=3489654 width=9) (actual time=0.026..28.866 rows=105725 loops=3)"
"                          Heap Fetches: 0"
"Planning Time: 0.099 ms"
"Execution Time: 76.680 ms"

現在我想用不區分大小寫的索引來做到這一點:

create index on contacts (lower(home_city));
select lower(home_city), count(*) 
from contacts 
group by lower(home_city) 
limit 50;

需要 3-4 秒。它似乎使用了錯誤的索引:

"Limit  (cost=215869.40..215882.20 rows=50 width=40) (actual time=4253.233..4253.846 rows=50 loops=1)"
"  ->  Finalize GroupAggregate  (cost=215869.40..216175.66 rows=1197 width=40) (actual time=4253.232..4253.335 rows=50 loops=1)"
"        Group Key: (lower((home_city)::text))"
"        ->  Gather Merge  (cost=215869.40..216148.72 rows=2394 width=40) (actual time=4253.219..4253.796 rows=102 loops=1)"
"              Workers Planned: 2"
"              Workers Launched: 2"
"              ->  Sort  (cost=214869.38..214872.37 rows=1197 width=40) (actual time=4244.566..4244.630 rows=710 loops=3)"
"                    Sort Key: (lower((home_city)::text))"
"                    Sort Method: quicksort  Memory: 88kB"
"                    Worker 0:  Sort Method: quicksort  Memory: 89kB"
"                    Worker 1:  Sort Method: quicksort  Memory: 120kB"
"                    ->  Partial HashAggregate  (cost=214793.22..214808.18 rows=1197 width=40) (actual time=4241.397..4241.658 rows=1021 loops=3)"
"                          Group Key: lower((home_city)::text)"
"                          ->  Parallel Index Only Scan using contacts_6_home_city_idx on contacts_6  (cost=0.43..197344.95 rows=3489654 width=32) (actual time=0.059..2918.355 rows=2778720 loops=3)"
"                                Heap Fetches: 0"
"Planning Time: 0.118 ms"
"Execution Time: 4253.906 ms"

當我刪除普通 home_city 上的索引時,它會進行全表掃描,需要 6 秒。

如何讓它使用正確的索引?我在這裡想念什麼?

(我使用的是 Postgres 11.5)。

有這個程式碼check_index_onlysrc/backend/optimizer/path/indxpath.c

/*
* check_index_only
*      Determine whether an index-only scan is possible for this index.
*/
static bool
check_index_only(RelOptInfo *rel, IndexOptInfo *index)
{

[...]

   /*
    * Construct a bitmapset of columns that the index can return back in an
    * index-only scan.  If there are multiple index columns containing the
    * same attribute, all of them must be capable of returning the value,
    * since we might recheck operators on any of them.  (Potentially we could
    * be smarter about that, but it's such a weird situation that it doesn't
    * seem worth spending a lot of sweat on.)
    */
   for (i = 0; i < index->ncolumns; i++)
   {
       int         attno = index->indexkeys[i];

       /*
        * For the moment, we just ignore index expressions.  It might be nice
        * to do something with them, later.
        */
       if (attno == 0)
           continue;

因此,如果我沒看錯,那麼在考慮僅索引掃描時會忽略表達式。

我認為這不是原則問題,只是沒有實施。

從 PostgreSQL v12 開始,您可以使用不區分大小寫的 ICU 排序規則來獲得您想要的內容:

CREATE COLLATION und_ci (
  LOCALE = 'und-u-ks-level2',
  PROVIDER = icu,
  DETERMINISTIC = FALSE
);

然後您必須更改列以使用該排序規則:

ALTER TABLE contacts ALTER home_city TYPE text COLLATE und_ci;

該索引可用於不區分大小寫的搜尋,您也可以將其用於僅索引掃描。

引用自:https://dba.stackexchange.com/questions/255353