Postgresql

如何針對非常高頻的更新優化表?

  • January 7, 2019

我有一個表,其中包含需要定期執行的任務列表:

applaudience=> \d+ maintenance_task
                                                                      Table "public.maintenance_task"
              Column               |           Type           | Collation | Nullable |                   Default                    | Storage  | Stats target | Description
------------------------------------+--------------------------+-----------+----------+----------------------------------------------+----------+--------------+-------------
id                                 | integer                  |           | not null | nextval('maintenance_task_id_seq'::regclass) | plain    |              |
nid                                | citext                   |           | not null |                                              | extended |              |
execution_interval                 | interval                 |           | not null |                                              | plain    |              |
last_attempted_at                  | timestamp with time zone |           |          | now()                                        | plain    |              |
last_maintenance_task_execution_id | integer                  |           |          |                                              | plain    |              |
disabled_at                        | timestamp with time zone |           |          |                                              | plain    |              |
maximum_execution_duration         | interval                 |           | not null | '00:05:00'::interval                         | plain    |              |
maximum_concurrent_execution_count | integer                  |           | not null | 0                                            | plain    |              |
last_exhausted_at                  | timestamp with time zone |           | not null | now()                                        | plain    |              |
Indexes:
   "maintenance_task_pkey" PRIMARY KEY, btree (id)
   "maintenance_task_name_idx" UNIQUE, btree (nid)
Foreign-key constraints:
   "maintenance_task_last_maintenance_task_execution_id_fkey" FOREIGN KEY (last_maintenance_task_execution_id) REFERENCES maintenance_task_execution(id) ON DELETE SET NULL
Referenced by:
   TABLE "maintenance_task_execution" CONSTRAINT "maintenance_task_execution_maintenance_task_id_fkey" FOREIGN KEY (maintenance_task_id) REFERENCES maintenance_task(id) ON DELETE CASCADE
Options: autovacuum_vacuum_threshold=0, autovacuum_analyze_threshold=0, fillfactor=50

每次選擇要執行的任務時,我們都會更新 的值last_attempted_at。以下查詢用於安排新任務:

CREATE OR REPLACE FUNCTION schedule_maintenance_task()
RETURNS table(maintenance_task_id int)
AS $$
BEGIN
 RETURN QUERY
 EXECUTE $q$
   UPDATE maintenance_task
   SET last_attempted_at = now()
   WHERE
     id = (
       WITH
         active_maintenance_task_execution_count AS (
           SELECT DISTINCT ON (maintenance_task_id)
             maintenance_task_id,
             execution_count
           FROM (
             SELECT
               id maintenance_task_id,
               0 execution_count
             FROM maintenance_task
             UNION
             SELECT
               mte1.maintenance_task_id,
               count(*) execution_count
             FROM maintenance_task_execution mte1
             WHERE
               mte1.ended_at IS NULL
             GROUP BY mte1.maintenance_task_id
           ) AS t
           ORDER BY
             maintenance_task_id,
             execution_count DESC
         )
       SELECT mt1.id
       FROM maintenance_task mt1
       INNER JOIN active_maintenance_task_execution_count amtec1 ON amtec1.maintenance_task_id = mt1.id
       WHERE
         mt1.disabled_at IS NULL AND
         mt1.maximum_concurrent_execution_count >= amtec1.execution_count AND
         (
           mt1.last_attempted_at < now() - mt1.execution_interval OR
           mt1.last_exhausted_at < now() - mt1.execution_interval
         )
       ORDER BY
         mt1.last_attempted_at ASC
       LIMIT 1
       FOR UPDATE OF mt1 SKIP LOCKED
     )
   RETURNING id
 $q$;
END
$$
LANGUAGE plpgsql
SET work_mem='50MB';

schedule_maintenance_task查詢正在以大約 600/分鐘的速度執行。

大約 24 小時後問題開始出現:

applaudience=> EXPLAIN (analyze, buffers)
applaudience-> SELECT id
applaudience-> FROM maintenance_task;
                                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Seq Scan on maintenance_task  (cost=0.00..7715.86 rows=286886 width=4) (actual time=3.675..385.042 rows=31 loops=1)
  Buffers: shared hit=9455
Planning time: 0.236 ms
Execution time: 385.060 ms
(4 rows)

applaudience=> SELECT *
applaudience-> FROM pg_stat_all_tables
applaudience-> WHERE schemaname = 'public' AND relname = 'maintenance_task';
 relid   | schemaname |     relname      | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | last_vacuum |        last_autovacuum        | last_analyze |       last_autoanalyze        | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count
----------+------------+------------------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+-------------+-------------------------------+--------------+-------------------------------+--------------+------------------+---------------+-------------------
22903432 | public     | maintenance_task |   163230 |      5060130 |  5571795 |       7988441 |         0 |    185359 |         0 |        172989 |     148568 |     138285 |                9733 |             | 2018-12-09 11:00:33.978177+00 |              | 2018-12-09 10:01:07.945327+00 |            0 |             6922 |             0 |              1416
(1 row)

死元組的數量增長到 100k+。一個簡單的 seq 掃描需要讀取 9k+ 緩衝區來獲取 31 行。

這是一個VACUUM VERBOSE maintenance_task日誌:

INFO:  vacuuming "public.maintenance_task"
INFO:  index "maintenance_task_pkey" now contains 9555 row versions in 331 pages
DETAIL:  0 index row versions were removed.
282 index pages have been deleted, 282 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "maintenance_task_name_idx" now contains 9555 row versions in 787 pages
DETAIL:  0 index row versions were removed.
690 index pages have been deleted, 690 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  "maintenance_task": found 0 removable, 145247 nonremovable row versions in 2459 out of 4847 pages
DETAIL:  145217 dead row versions cannot be removed yet, oldest xmin: 928967630
There were 180 unused item pointers.
Skipped 1 page due to buffer pins, 2387 frozen pages.
0 pages are entirely empty.
CPU: user: 0.05 s, system: 0.00 s, elapsed: 0.34 s.
INFO:  vacuuming "pg_toast.pg_toast_22903432"
INFO:  index "pg_toast_22903432_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  "pg_toast_22903432": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 928967630
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM

可以做些什麼來防止死元組數量的增長/調度查詢的速度變慢?

old_snapshot_thresholdPostgres 9.6 中針對此類情況添加了該設置。不允許長時間打開的查詢無限期地阻止真空。如果查詢從不需要可能已被清理掉的數據,它將正常完成。如果發現它確實需要這樣的數據,它會拋出一個錯誤。如果它是被遺忘的連接的一部分,它會無限期地掛起,但不會導致表膨脹。

但是請注意,此設置可防止autovacuum在關係結束時將釋放的空間返回給作業系統,因為這是檢測錯誤條件所必需的。只有手冊VACUUM FULL仍然會強制它。因此,雖然它可以幫助對抗一種類型的表格膨脹(正是您的問題),但它可能導致另一種(通常不太重要)。

或者考慮idle_in_transaction_session_timeout,它會終止空閒時間過長的會話。明智地選擇您的設置。

我最初的評估是錯誤的。

感謝我在 Freenode 上獲得的幫助,我能夠了解根本原因並解決了不斷膨脹的表格問題。

我需要糾正自己的第一件事是對VACUUM工作原理的主要理解。VACUUM無法從不在與表關聯的磁碟關係文件末尾的緩衝區中回收磁碟空間。但是,VACUUM可以在與表關聯的磁碟關係文件的末尾重新組織緩衝區,即如果有許多更新並且VACUUM在創建新緩衝區之前正在執行,那麼新行將就地儲存在同一個緩衝區中刪除的行數和新緩衝區的數量不會增加。

為了VACUUM能夠將空間返回給作業系統,需要滿足以下條件:

  • 表格末尾的一個或多個頁面完全空閒
  • 可以輕鬆獲得排他表鎖
  • 死行不再與任何現有事務相關

我的緩衝區不斷增長的事實表明其中一個條件沒有得到滿足。

因此,首先要檢查的是最古老的實時交易:

applaudience=> SELECT age(backend_xmin), (now() - xact_start), query
applaudience-> FROM pg_stat_activity
applaudience-> WHERE backend_xmin IS NOT NULL
applaudience-> ORDER BY age(backend_xmin) DESC
applaudience-> LIMIT 1;
  age   |    ?column?     |                                                                                      query
---------+-----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
4644352 | 12:31:38.895198 | -- Metabase                                                                                                                                                                    +
        |                 | SELECT "public"."http_response"."body" AS "body" FROM "public"."http_response" GROUP BY "public"."http_response"."body" ORDER BY "public"."http_response"."body" ASC LIMIT 5000
(1 row)

事實證明,在我的情況下,有一個執行時間長的查詢阻止 VACUUM 從緩衝區中刪除死行。這也可以通過查看vacuum verbose日誌來推測:

DETAIL:  145217 dead row versions cannot be removed yet, oldest xmin: 928967630

日誌中的此條目指向頻率極高的更新或阻止清理的長時間執行的事務。

為了解決這個問題,我不得不:

  1. 終止長時間執行的事務。
  2. 執行VACUUM FULL maintenance_task一次。
  3. 確保沒有長時間執行的事務阻塞 VACUUM 程序。

老答案:

似乎沒有辦法在不引起膨脹的情況下更新元組。

VACUUM FULL根據我讀過的所有內容,執行常式或等效的表重寫變體似乎是不可避免的。

VACUUM當由於大量更新或刪除活動而導致表包含大量死行版本時,純文字可能無法令人滿意。如果您有這樣的表,並且需要回收它佔用的多餘磁碟空間VACUUM FULL,則需要使用. 這些命令重寫表的全新副本並為其建構新索引。所有這些選項都需要排他鎖。請注意,它們還臨時使用大約等於表大小的額外磁碟空間,因為表和索引的舊副本在新副本完成之前無法釋放。CLUSTER``ALTER TABLE

https://www.postgresql.org/docs/current/routine-vacuuming.html

假設我對上述內容是正確的,那麼解決方案是盡量減少全表重寫的影響。在這個程度上,我發現了pg_repackpg_repack– 用最少的鎖重新組織 PostgreSQL 數據庫中的表。

由於maintenance_task表很小(少於 50 行),我應該能夠pg_repack每小時執行一次,而對調度工作人員的影響最小。

不幸的是,當表包含大量定期更新的行時,此解決方案效果不佳。

引用自:https://dba.stackexchange.com/questions/224495