Locking

ON CONFLICT DO UPDATE 導致死鎖?

  • April 8, 2021

我有一個項目,我正在嘗試使用 PostgreSQLON CONFLICT DO UPDATE子句,但我遇到了大量的死鎖問題。

我的架構如下:

webarchive=# \d web_pages
                                              Table "public.web_pages"
     Column       |            Type             |                              Modifiers
-------------------+-----------------------------+---------------------------------------------------------------------
id                | integer                     | not null default nextval('web_pages_id_seq'::regclass)
state             | dlstate_enum                | not null
errno             | integer                     |
url               | text                        | not null
starturl          | text                        | not null
netloc            | text                        | not null
file              | integer                     |
priority          | integer                     | not null
distance          | integer                     | not null
is_text           | boolean                     |
limit_netloc      | boolean                     |
title             | citext                      |
mimetype          | text                        |
type              | itemtype_enum               |
content           | text                        |
fetchtime         | timestamp without time zone |
addtime           | timestamp without time zone |
tsv_content       | tsvector                    |
normal_fetch_mode | boolean                     | default true
ignoreuntiltime   | timestamp without time zone | not null default '1970-01-01 00:00:00'::timestamp without time zone
Indexes:
   "web_pages_pkey" PRIMARY KEY, btree (id)
   "ix_web_pages_url" UNIQUE, btree (url)
   "idx_web_pages_title" gin (to_tsvector('english'::regconfig, title::text))
   "ix_web_pages_distance" btree (distance)
   "ix_web_pages_distance_filtered" btree (priority) WHERE state = 'new'::dlstate_enum AND distance < 1000000 AND normal_fetch_mode = true
   "ix_web_pages_id" btree (id)
   "ix_web_pages_netloc" btree (netloc)
   "ix_web_pages_priority" btree (priority)
   "ix_web_pages_state" btree (state)
   "ix_web_pages_url_ops" btree (url text_pattern_ops)
   "web_pages_state_netloc_idx" btree (state, netloc)
Foreign-key constraints:
   "web_pages_file_fkey" FOREIGN KEY (file) REFERENCES web_files(id)
Triggers:
   update_row_count_trigger BEFORE INSERT OR UPDATE ON web_pages FOR EACH ROW EXECUTE PROCEDURE web_pages_content_update_func()

我的更新命令如下:

INSERT INTO
   web_pages
   (url, starturl, netloc, distance, is_text, priority, type, fetchtime, state)
VALUES
   (:url, :starturl, :netloc, :distance, :is_text, :priority, :type, :fetchtime, :state)
ON CONFLICT (url) DO
   UPDATE
       SET
           state     = EXCLUDED.state,
           starturl  = EXCLUDED.starturl,
           netloc    = EXCLUDED.netloc,
           is_text   = EXCLUDED.is_text,
           distance  = EXCLUDED.distance,
           priority  = EXCLUDED.priority,
           fetchtime = EXCLUDED.fetchtime
       WHERE
           web_pages.fetchtime < :threshtime
       AND
           web_pages.url = EXCLUDED.url
   ;

(注意:參數通過SQLAlchemy參數化查詢樣式進行轉義)

我看到了幾十個死鎖錯誤,即使在相對較輕的並發下(6 個工作人員):

Main.SiteArchiver.Process-5.MainThread - WARNING - SQLAlchemy OperationalError - Retrying.
Traceback (most recent call last):
 File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
   context)
 File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute
   cursor.execute(statement, parameters)
psycopg2.extensions.TransactionRollbackError: deadlock detected
DETAIL:  Process 11391 waits for ShareLock on transaction 40632808; blocked by process 11389.
Process 11389 waits for ShareLock on transaction 40632662; blocked by process 11391.
HINT:  See server log for query details.
CONTEXT:  while inserting index tuple (743427,2) in relation "web_pages"


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
 File "/media/Storage/Scripts/ReadableWebProxy/WebMirror/Engine.py", line 558, in upsertResponseLinks
   self.db_sess.execute(cmd, params=new)
 File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/orm/session.py", line 1034, in execute
   bind, close_with_result=True).execute(clause, params or {})
 File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 914, in execute
   return meth(self, multiparams, params)
 File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
   return connection._execute_clauseelement(self, multiparams, params)
 File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement
   compiled_sql, distilled_params
 File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context
   context)
 File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1341, in _handle_dbapi_exception
   exc_info
 File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/util/compat.py", line 200, in raise_from_cause
   reraise(type(exception), exception, tb=exc_tb, cause=cause)
 File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/util/compat.py", line 183, in reraise
   raise value.with_traceback(tb)
 File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
   context)
 File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute
   cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.extensions.TransactionRollbackError) deadlock detected
DETAIL:  Process 11391 waits for ShareLock on transaction 40632808; blocked by process 11389.
Process 11389 waits for ShareLock on transaction 40632662; blocked by process 11391.
HINT:  See server log for query details.
CONTEXT:  while inserting index tuple (743427,2) in relation "web_pages"
[SQL: '         INSERT INTO          web_pages          (url, starturl, netloc, distance, is_text, priority, type, fetchtime, state)         VALUES          (%(url)s, %(starturl)s, %(netloc)s, %(distance)s, %(is_text)s, %(priority)s, %(type)s, %(fetchtime)s, %(state)s)         ON CONFLICT (url) DO          UPDATE           SET            state     = EXCLUDED.state,            starturl  = EXCLUDED.starturl,            netloc    = EXCLUDED.netloc,            is_text   = EXCLUDED.is_text,            distance  = EXCLUDED.distance,            priority  = EXCLUDED.priority,            fetchtime = EXCLUDED.fetchtime           WHERE            web_pages.fetchtime < %(threshtime)s          ;         '] [parameters: {'url': 'xxxxxx', 'is_text': True, 'netloc': 'xxxxxx', 'distance': 1000000, 'priority': 10000, 'threshtime': datetime.datetime(2016, 4, 24, 0, 38, 10, 778866), 'state': 'new', 'starturl': 'xxxxxxx', 'type': 'unknown', 'fetchtime': datetime.datetime(2016, 4, 24, 0, 38, 10, 778934)}]

我的事務隔離級別是REPEATABLE READ,所以我對數據庫應該如何工作的理解是,我會看到很多序列化錯誤,但不應該發生死鎖,因為如果兩個事務更改同一行,後面的事務應該會失敗。

我的猜測是 UPDATE 以某種方式鎖定了 INSERT 查詢(或類似的東西),我需要在某處放置一個同步點(?),但我不太了解各種查詢組件的範圍進行任何故障排除,然後只是隨機更改內容並查看效果。我已經閱讀了一些資料,但是 PostgreSQL 文件非常抽象,而且ON CONFLICT xxx術語似乎還沒有被廣泛使用,因此沒有那麼多資源可用於實際故障排除,尤其是對於非 SQL 專家。

我該如何嘗試解決這個問題?我還嘗試了其他隔離級別(READ COMMITTED, SERIALIZABLE)但無濟於事。

死鎖不是由特定語句引起的。它是由並發問題引起的。所以基本上,您應該開始觀察您的應用程序的一個會話如何處理同時工作的其他會話。

以下是避免死鎖的一般準則:

  1. 始終維護表上的主鍵。這個主鍵應該是辨識表中特定記錄的方法。這將避免太多行進入鎖定範圍。
  2. 在所有交易中保持秩序。例如,您的一個應用程序邏輯在表 A 然後表 B 中插入/更新數據。不應該有另一個邏輯在表 B 然後表 A 中插入/更新數據。
  3. 監視並抓住罪魁禍首。PostgreSQL 提供 pg_stat_activty 和 pg_stat_statements 之類的視圖來監控會話和查詢。這是一些範例查詢,您可以使用它們來監控阻塞/死鎖。https://wiki.postgresql.org/wiki/Lock_Monitoring 您可能需要調整 log_lock_waits 和 deadlock_timeout 參數。
  4. 在事務中首先獲取最嚴格的鎖。所以較小的不會出現。
  5. 最後但最不重要的是,減少事務的大小。更頻繁地送出。長時間執行的事務更有可能陷入死鎖。此外,由於 MVCC 在 Postgres 中實現的方式,postgres 中的長事務包含更多數量的活動元組。

引用自:https://dba.stackexchange.com/questions/136355