Postgresql
為什麼有些計數查詢這麼慢?
我有一個
users
包含 260 萬行的表,SELECT COUNT(*) FROM users
需要 2 秒才能成功,但請求stories
包含 520 萬行的表需要一個多小時。解釋非常相似:
explain select count(*) from users; Finalize Aggregate (cost=89417.87..89417.88 rows=1 width=8) -> Gather (cost=89417.65..89417.86 rows=2 width=8) Workers Planned: 2 -> Partial Aggregate (cost=88417.65..88417.66 rows=1 width=8) -> Parallel Seq Scan on users (cost=0.00..85702.72 rows=1085972 width=0)
explain select count(*) from stories; Finalize Aggregate (cost=428235.66..428235.67 rows=1 width=8) -> Gather (cost=428235.45..428235.66 rows=2 width=8) Workers Planned: 2 -> Partial Aggregate (cost=427235.45..427235.46 rows=1 width=8) -> Parallel Index Only Scan using stories__is_permanently_deleted__idx on stories (cost=0.43..421752.81 rows=2193057 width=0)
Postgres 版本:
version PostgreSQL 10.6 (Ubuntu 10.6-0ubuntu0.18.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0, 64-bit
的表定義
stories
:Column | Type | Collation | Nullable | Default | Storage | Stats target | Description -------------------------+---------+-----------+----------+---------------------------------------------------+----------+--------------+------------- id | bigint | | not null | nextval('stories_id_seq'::regclass) | plain | | rating | integer | | not null | | plain | | number_of_pluses | integer | | not null | | plain | | number_of_minuses | integer | | not null | | plain | | title | text | | not null | | extended | | content_blocks | jsonb | | not null | | extended | | created_at_timestamp | bigint | | not null | | plain | | story_url | text | | not null | | extended | | tags | jsonb | | not null | | extended | | number_of_comments | integer | | not null | | plain | | is_deleted | boolean | | not null | | plain | | is_rating_hidden | boolean | | not null | | plain | | has_mine_tag | boolean | | not null | | plain | | has_adult_tag | boolean | | not null | | plain | | is_longpost | boolean | | not null | | plain | | author_id | bigint | | not null | | plain | | author_username | text | | not null | | extended | | author_profile_url | text | | not null | | extended | | author_avatar_url | text | | not null | | extended | | community_link | text | | not null | | extended | | community_name | text | | not null | | extended | | comments_are_hot | boolean | | not null | | plain | | added_timestamp | bigint | | not null | | plain | | last_update_timestamp | bigint | | not null | | plain | | next_update_timestamp | bigint | | not null | | plain | | task_taken_at_timestamp | bigint | | not null | | plain | | is_permanently_deleted | boolean | | not null | false | plain | | Indexes: "stories_pkey" PRIMARY KEY, btree (id) "stories__added_timestamp__idx" btree (added_timestamp) "stories__is_permanently_deleted__idx" btree (is_permanently_deleted) "stories__last_update_timestamp__idx" btree (last_update_timestamp) "stories__next_update_timestamp__idx" btree (next_update_timestamp) "stories__task_taken_at_timestamp__idx" btree (task_taken_at_timestamp)
索引定義:
Index "public.stories__is_permanently_deleted__idx" Column | Type | Definition | Storage ------------------------+---------+------------------------+--------- is_permanently_deleted | boolean | is_permanently_deleted | plain btree, for table "public.stories"
重新索引後(按照建議)
EXPLAIN (ANALYZE, BUFFERS) select count(*) from stories
::Finalize Aggregate (cost=356218.22..356218.23 rows=1 width=8) (actual time=273577.971..273577.971 rows=1 loops=1) Buffers: shared hit=186467 read=65977 dirtied=24 written=1166 -> Gather (cost=356218.01..356218.22 rows=2 width=8) (actual time=272647.858..273602.243 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=186467 read=65977 dirtied=24 written=1166 -> Partial Aggregate (cost=355218.01..355218.02 rows=1 width=8) (actual time=272938.947..272938.948 rows=1 loops=3) Buffers: shared hit=186467 read=65977 dirtied=24 written=1166 -> Parallel Index Only Scan using stories__is_permanently_deleted__idx on stories (cost=0.43..349741.33 rows=2190671 width=0) (actual time=0.386..271148.590 rows=1752497 loops=3) Heap Fetches: 654818 Buffers: shared hit=186467 read=65977 dirtied=24 written=1166 Planning time: 0.726 ms Execution time: 273602.447 ms
當你有一個
Parallel Seq Scan
forusers
時,你會得到一個Parallel Index Only Scan
forstories
- 這通常比對錶的順序掃描要快。如果它那麼慢,那麼明顯的原因就是索引膨脹(或更糟糕的是,索引損壞)。
重新創建索引並再次測試以查看是否如此。如果是肯定的,請調查是什麼使您的索引膨脹(或損壞)。腐敗應該是一個極其罕見的例外 - 除非您使用有故障的 RAM / 儲存進行操作。
REINDEX INDEX stories__is_permanently_deleted__idx;
如果您不需要精確計數,那麼還有更快的替代方案:
**另外:**像這樣對列重新排序
stories
以每行節省約 20 個字節:專欄 | 類型 -------------------------+--------- 編號 | 大整數 created_at_timestamp | 大整數 添加時間戳 | 大整數 last_update_timestamp | 大整數 next_update_timestamp | 大整數 task_taken_at_timestamp | 大整數 作者_id | 大整數 評級 | 整數 number_of_pluses | 整數 number_of_minuses | 整數 number_of_comments | 整數 is_deleted | 布爾值 is_rating_hidden | 布爾值 has_mine_tag | 布爾值 has_adult_tag | 布爾值 is_longpost | 布爾值 comments_are_hot | 布爾值 is_permanently_deleted | 布爾值 作者_使用者名 | 文本 author_profile_url | 文本 author_avatar_url | 文本 社區連結 | 文本 社區名稱 | 文本 標題 | 文本 故事網址 | 文本 內容塊 | jsonb 標籤 | jsonb
看: