Postgresql

為什麼有些計數查詢這麼慢?

  • April 17, 2019

我有一個users包含 260 萬行的表,SELECT COUNT(*) FROM users需要 2 秒才能成功,但請求stories包含 520 萬行的表需要一個多小時

解釋非常相似:

explain select count(*) from users;

Finalize Aggregate  (cost=89417.87..89417.88 rows=1 width=8)
  ->  Gather  (cost=89417.65..89417.86 rows=2 width=8)
        Workers Planned: 2
        ->  Partial Aggregate  (cost=88417.65..88417.66 rows=1 width=8)
              ->  Parallel Seq Scan on users  (cost=0.00..85702.72 rows=1085972 width=0)
explain select count(*) from stories;

Finalize Aggregate  (cost=428235.66..428235.67 rows=1 width=8)
  ->  Gather  (cost=428235.45..428235.66 rows=2 width=8)
        Workers Planned: 2
        ->  Partial Aggregate  (cost=427235.45..427235.46 rows=1 width=8)
              ->  Parallel Index Only Scan using stories__is_permanently_deleted__idx on stories  (cost=0.43..421752.81 rows=2193057 width=0)

Postgres 版本:

                                                           version                                                            
PostgreSQL 10.6 (Ubuntu 10.6-0ubuntu0.18.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0, 64-bit

的表定義stories

        Column          |  Type   | Collation | Nullable |                      Default                      | Storage  | Stats target | Description 
-------------------------+---------+-----------+----------+---------------------------------------------------+----------+--------------+-------------
id                      | bigint  |           | not null | nextval('stories_id_seq'::regclass) | plain    |              | 
rating                  | integer |           | not null |                                                   | plain    |              | 
number_of_pluses        | integer |           | not null |                                                   | plain    |              | 
number_of_minuses       | integer |           | not null |                                                   | plain    |              | 
title                   | text    |           | not null |                                                   | extended |              | 
content_blocks          | jsonb   |           | not null |                                                   | extended |              | 
created_at_timestamp    | bigint  |           | not null |                                                   | plain    |              | 
story_url               | text    |           | not null |                                                   | extended |              | 
tags                    | jsonb   |           | not null |                                                   | extended |              | 
number_of_comments      | integer |           | not null |                                                   | plain    |              | 
is_deleted              | boolean |           | not null |                                                   | plain    |              | 
is_rating_hidden        | boolean |           | not null |                                                   | plain    |              | 
has_mine_tag            | boolean |           | not null |                                                   | plain    |              | 
has_adult_tag           | boolean |           | not null |                                                   | plain    |              | 
is_longpost             | boolean |           | not null |                                                   | plain    |              | 
author_id               | bigint  |           | not null |                                                   | plain    |              | 
author_username         | text    |           | not null |                                                   | extended |              | 
author_profile_url      | text    |           | not null |                                                   | extended |              | 
author_avatar_url       | text    |           | not null |                                                   | extended |              | 
community_link          | text    |           | not null |                                                   | extended |              | 
community_name          | text    |           | not null |                                                   | extended |              | 
comments_are_hot        | boolean |           | not null |                                                   | plain    |              | 
added_timestamp         | bigint  |           | not null |                                                   | plain    |              | 
last_update_timestamp   | bigint  |           | not null |                                                   | plain    |              | 
next_update_timestamp   | bigint  |           | not null |                                                   | plain    |              | 
task_taken_at_timestamp | bigint  |           | not null |                                                   | plain    |              | 
is_permanently_deleted  | boolean |           | not null | false                                             | plain    |              | 
Indexes:
   "stories_pkey" PRIMARY KEY, btree (id)
   "stories__added_timestamp__idx" btree (added_timestamp)
   "stories__is_permanently_deleted__idx" btree (is_permanently_deleted)
   "stories__last_update_timestamp__idx" btree (last_update_timestamp)
   "stories__next_update_timestamp__idx" btree (next_update_timestamp)
   "stories__task_taken_at_timestamp__idx" btree (task_taken_at_timestamp)

索引定義:

    Index "public.stories__is_permanently_deleted__idx"
        Column         |  Type   |       Definition       | Storage 
------------------------+---------+------------------------+---------
is_permanently_deleted | boolean | is_permanently_deleted | plain
btree, for table "public.stories"

重新索引後(按照建議)

EXPLAIN (ANALYZE, BUFFERS) select count(*) from stories::

Finalize Aggregate  (cost=356218.22..356218.23 rows=1 width=8) (actual time=273577.971..273577.971 rows=1 loops=1)
  Buffers: shared hit=186467 read=65977 dirtied=24 written=1166
  ->  Gather  (cost=356218.01..356218.22 rows=2 width=8) (actual time=272647.858..273602.243 rows=3 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=186467 read=65977 dirtied=24 written=1166
        ->  Partial Aggregate  (cost=355218.01..355218.02 rows=1 width=8) (actual time=272938.947..272938.948 rows=1 loops=3)
              Buffers: shared hit=186467 read=65977 dirtied=24 written=1166
              ->  Parallel Index Only Scan using stories__is_permanently_deleted__idx on stories  (cost=0.43..349741.33 rows=2190671 width=0) (actual time=0.386..271148.590 rows=1752497 loops=3)
                    Heap Fetches: 654818
                    Buffers: shared hit=186467 read=65977 dirtied=24 written=1166
Planning time: 0.726 ms
Execution time: 273602.447 ms

當你有一個Parallel Seq Scanforusers時,你會得到一個Parallel Index Only Scanfor stories- 這通常比對錶的順序掃描要快。

如果它那麼慢,那麼明顯的原因就是索引膨脹(或更糟糕的是,索引損壞)。

重新創建索引並再次測試以查看是否如此。如果是肯定的,請調查是什麼使您的索引膨脹(或損壞)。腐敗應該是一個極其罕見的例外 - 除非您使用有故障的 RAM / 儲存進行操作。

REINDEX INDEX stories__is_permanently_deleted__idx;

如果您不需要精確計數,那麼還有更快的替代方案:


**另外:**像這樣對列重新排序stories以每行節省約 20 個字節:

專欄 | 類型 
-------------------------+---------
編號 | 大整數 
created_at_timestamp | 大整數 
添加時間戳 | 大整數 
last_update_timestamp | 大整數 
next_update_timestamp | 大整數 
task_taken_at_timestamp | 大整數 
作者_id | 大整數
評級 | 整數
number_of_pluses | 整數
number_of_minuses | 整數
number_of_comments | 整數 
is_deleted | 布爾值
is_rating_hidden | 布爾值
has_mine_tag | 布爾值
has_adult_tag | 布爾值
is_longpost | 布爾值
comments_are_hot | 布爾值
is_permanently_deleted | 布爾值
作者_使用者名 | 文本 
author_profile_url | 文本 
author_avatar_url | 文本 
社區連結 | 文本 
社區名稱 | 文本 
標題 | 文本 
故事網址 | 文本 
內容塊 | jsonb 
標籤 | jsonb

看:

引用自:https://dba.stackexchange.com/questions/235094