Sql-Server

為什麼這兩個 INSERT 會死鎖?是觸發器嗎?這個死鎖跟踪 1222 日誌告訴我什麼?

  • September 9, 2013

當接收到多個並發 API 請求時,我們看到生產中出現間歇性死鎖。每個請求基本上都會在同一張表中的 INSERT 語句中達到高潮,這就是我們看到死鎖的地方。我編寫了一個雙執行緒控制台應用程序,它可以通過簡單地同時執行兩個 API 請求來可靠地重現問題,但僅限於生產環境,而不是暫存環境。(這讓我相信,我們的暫存數據庫——可能是數據量、SQL Server 2012 與 2005 或索引調整——與生產不同,從而避免了死鎖。程式碼是相同,我相信是架構。)

由於我現在可以重現死鎖,我能夠說服我的老闆暫時啟用跟踪標誌 1222,並擷取以下日誌:

Date,Source,Severity,Message
09/05/2013 16:32:19,spid71,Unknown,DBCC TRACEOFF 1222<c/> server process ID (SPID) 71. This is an informational message only; no user action is required.
09/05/2013 16:30:55,spid17s,Unknown,waiter id=processf34868 mode=X requestType=wait
09/05/2013 16:30:55,spid17s,Unknown,waiter-list
09/05/2013 16:30:55,spid17s,Unknown,owner id=processf35c18 mode=X
09/05/2013 16:30:55,spid17s,Unknown,owner-list
09/05/2013 16:30:55,spid17s,Unknown,objectlock lockPartition=0 objid=428945000 subresource=FULL dbid=8 objectname=MyDB.DomainTransferRAR id=lock120a72c80 mode=X associatedObjectId=428945000
09/05/2013 16:30:55,spid17s,Unknown,waiter id=processf35c18 mode=X requestType=convert
09/05/2013 16:30:55,spid17s,Unknown,waiter-list
09/05/2013 16:30:55,spid17s,Unknown,owner id=processf34868 mode=IX
09/05/2013 16:30:55,spid17s,Unknown,owner-list
09/05/2013 16:30:55,spid17s,Unknown,objectlock lockPartition=0 objid=2096426938 subresource=FULL dbid=8 objectname=MyDB.DomainTransferRANT id=lock11de95480 mode=IX associatedObjectId=2096426938
09/05/2013 16:30:55,spid17s,Unknown,resource-list
09/05/2013 16:30:55,spid17s,Unknown,VALUES (@p0<c/>
09/05/2013 16:30:55,spid17s,Unknown,(@p0 uniqueidentifier<c/>@p1 int<c/>@p2 int<c/>@p3 varchar(8000)<c/>@p4 char(5)<c/>@p5 int<c/>@p6 datetime<c/>@p7 datetime<c/>@p8 varchar(8000)<c/>@p9 char(5)<c/>@p10 int<c/>@p11 datetime<c/>@p12 datetime<c/>@p13 varchar(8000)<c/>@p14 int<c/>@p15 xml<c/>@p16 datetime<c/>@p17 datetime<c/>@p18 varchar(8000)<c/>@p19 datetime<c/>@p20 datetime<c/>@p21 varchar(8000)<c/>@p22 bit<c/>@p23 varchar(8000)<c/>@p24 varchar(8000)<c/>@p25 uniqueidentifier)INSERT INTO [dbo].[DomainTransferRANT]([DomainTransferRANTGUID]<c/> [PrebookedBillPackageId]<c/> [domainID]<c/> [DomainName]<c/> [GainingWNAccountID]<c/> [GainingRegistrantID]<c/> [dtGainingRequestSent]<c/> [dtGainingResponseReceived]<c/> [DomainTransferGainingPartyRespCode]<c/> [LosingWNAccountID]<c/> [LosingRegistrantID]<c/> [dtLosingRequestSent]<c/> [dtLosingResponseReceived]<c/> [DomainTransferLosingPartyRespCode]<c/> [SubmittedBillPackageId]<c/> [ExtraInfo]<c/> [dtRequestSentToRegistry]<c/> [dtResponseFromRegistry]<c/> [DomainTransferRegistryRespCode]<c/> [dtDeleted]<c/> [dtLastChecked]<c/> [DomainTransferStatusCode]<c/> [SendConfirmationForEachDomain]<c/> [SummaryStatus]<c/> [WHOISBeforeTransfer])
09/05/2013 16:30:55,spid17s,Unknown,inputbuf
09/05/2013 16:30:55,spid17s,Unknown,unknown
09/05/2013 16:30:55,spid17s,Unknown,frame procname=unknown line=1 sqlhandle=0x000000000000000000000000000000000000000000000000
09/05/2013 16:30:55,spid17s,Unknown,VALUES (@p0<c/> @p1<c/> @p2<c/> @p3<c/> @p4<c/> @p5<c/> @p6<c/> @p7<c/> @p8<c/> @p9<c/> @p10<c/> @p11<c/> @p12<c/> @p13<c/> @p14<c/> @p15<c/> @p16<c/> @p17<c/> @p18<c/> @p19<c/> @p20<c/> @p21<c/> @p22<c/> @p23<c/> @p24)
09/05/2013 16:30:55,spid17s,Unknown,INSERT INTO [dbo].[DomainTransferRANT]([DomainTransferRANTGUID]<c/> [PrebookedBillPackageId]<c/> [domainID]<c/> [DomainName]<c/> [GainingWNAccountID]<c/> [GainingRegistrantID]<c/> [dtGainingRequestSent]<c/> [dtGainingResponseReceived]<c/> [DomainTransferGainingPartyRespCode]<c/> [LosingWNAccountID]<c/> [LosingRegistrantID]<c/> [dtLosingRequestSent]<c/> [dtLosingResponseReceived]<c/> [DomainTransferLosingPartyRespCode]<c/> [SubmittedBillPackageId]<c/> [ExtraInfo]<c/> [dtRequestSentToRegistry]<c/> [dtResponseFromRegistry]<c/> [DomainTransferRegistryRespCode]<c/> [dtDeleted]<c/> [dtLastChecked]<c/> [DomainTransferStatusCode]<c/> [SendConfirmationForEachDomain]<c/> [SummaryStatus]<c/> [WHOISBeforeTransfer])
09/05/2013 16:30:55,spid17s,Unknown,frame procname=adhoc line=1 stmtstart=738 stmtend=2322 sqlhandle=0x0200000038d7940c60b76abf51c3cf1bc774fe27ba136260
09/05/2013 16:30:55,spid17s,Unknown,select @iLockDummy = 1 from DomainTransferRANT with (TABLOCKX<c/> HOLDLOCK) OPTION(EXPAND VIEWS)
09/05/2013 16:30:55,spid17s,Unknown,frame procname=MyDB.tri_DomainTransferRANT_InsUpd line=68 stmtstart=10336 stmtend=10544 sqlhandle=0x03000800c8c31e75b259fa002fa200000000000000000000
09/05/2013 16:30:55,spid17s,Unknown,executionStack
09/05/2013 16:30:55,spid17s,Unknown,process id=processf35c18 taskpriority=0 logused=2992 waitresource=OBJECT: 8:2096426938:0  waittime=171 ownerId=26880196297 transactionname=user_transaction lasttranstarted=2013-09-05T16:30:55.710 XDES=0x1c7b18b60 lockMode=X schedulerid=2 kpid=34404 status=suspended spid=69 sbid=0 ecid=0 priority=0 transcount=2 lastbatchstarted=2013-09-05T16:30:55.763 lastbatchcompleted=2013-09-05T16:30:55.710 clientapp=.Net SqlClient Data Provider hostname=WEB1 hostpid=13216 loginname=client isolationlevel=serializable (4) xactid=26880196297 currentdb=8 lockTimeout=4294967295 clientoption1=673316896 clientoption2=128056
09/05/2013 16:30:55,spid17s,Unknown,VALUES (@p0<c/>
09/05/2013 16:30:55,spid17s,Unknown,(@p0 uniqueidentifier<c/>@p1 int<c/>@p2 int<c/>@p3 varchar(8000)<c/>@p4 char(5)<c/>@p5 int<c/>@p6 datetime<c/>@p7 datetime<c/>@p8 varchar(8000)<c/>@p9 char(5)<c/>@p10 int<c/>@p11 datetime<c/>@p12 datetime<c/>@p13 varchar(8000)<c/>@p14 int<c/>@p15 xml<c/>@p16 datetime<c/>@p17 datetime<c/>@p18 varchar(8000)<c/>@p19 datetime<c/>@p20 datetime<c/>@p21 varchar(8000)<c/>@p22 bit<c/>@p23 varchar(8000)<c/>@p24 varchar(8000)<c/>@p25 uniqueidentifier)INSERT INTO [dbo].[DomainTransferRANT]([DomainTransferRANTGUID]<c/> [PrebookedBillPackageId]<c/> [domainID]<c/> [DomainName]<c/> [GainingWNAccountID]<c/> [GainingRegistrantID]<c/> [dtGainingRequestSent]<c/> [dtGainingResponseReceived]<c/> [DomainTransferGainingPartyRespCode]<c/> [LosingWNAccountID]<c/> [LosingRegistrantID]<c/> [dtLosingRequestSent]<c/> [dtLosingResponseReceived]<c/> [DomainTransferLosingPartyRespCode]<c/> [SubmittedBillPackageId]<c/> [ExtraInfo]<c/> [dtRequestSentToRegistry]<c/> [dtResponseFromRegistry]<c/> [DomainTransferRegistryRespCode]<c/> [dtDeleted]<c/> [dtLastChecked]<c/> [DomainTransferStatusCode]<c/> [SendConfirmationForEachDomain]<c/> [SummaryStatus]<c/> [WHOISBeforeTransfer])
09/05/2013 16:30:55,spid17s,Unknown,inputbuf
09/05/2013 16:30:55,spid17s,Unknown,unknown
09/05/2013 16:30:55,spid17s,Unknown,frame procname=unknown line=1 sqlhandle=0x000000000000000000000000000000000000000000000000
09/05/2013 16:30:55,spid17s,Unknown,VALUES (@p0<c/> @p1<c/> @p2<c/> @p3<c/> @p4<c/> @p5<c/> @p6<c/> @p7<c/> @p8<c/> @p9<c/> @p10<c/> @p11<c/> @p12<c/> @p13<c/> @p14<c/> @p15<c/> @p16<c/> @p17<c/> @p18<c/> @p19<c/> @p20<c/> @p21<c/> @p22<c/> @p23<c/> @p24)
09/05/2013 16:30:55,spid17s,Unknown,INSERT INTO [dbo].[DomainTransferRANT]([DomainTransferRANTGUID]<c/> [PrebookedBillPackageId]<c/> [domainID]<c/> [DomainName]<c/> [GainingWNAccountID]<c/> [GainingRegistrantID]<c/> [dtGainingRequestSent]<c/> [dtGainingResponseReceived]<c/> [DomainTransferGainingPartyRespCode]<c/> [LosingWNAccountID]<c/> [LosingRegistrantID]<c/> [dtLosingRequestSent]<c/> [dtLosingResponseReceived]<c/> [DomainTransferLosingPartyRespCode]<c/> [SubmittedBillPackageId]<c/> [ExtraInfo]<c/> [dtRequestSentToRegistry]<c/> [dtResponseFromRegistry]<c/> [DomainTransferRegistryRespCode]<c/> [dtDeleted]<c/> [dtLastChecked]<c/> [DomainTransferStatusCode]<c/> [SendConfirmationForEachDomain]<c/> [SummaryStatus]<c/> [WHOISBeforeTransfer])
09/05/2013 16:30:55,spid17s,Unknown,frame procname=adhoc line=1 stmtstart=738 stmtend=2322 sqlhandle=0x0200000038d7940c60b76abf51c3cf1bc774fe27ba136260
09/05/2013 16:30:55,spid17s,Unknown,select @iLockDummy = 1 from DomainTransferRAR with (TABLOCKX<c/> HOLDLOCK) OPTION(EXPAND VIEWS)
09/05/2013 16:30:55,spid17s,Unknown,frame procname=MyDB.tri_DomainTransferRANT_InsUpd line=67 stmtstart=10140 stmtend=10334 sqlhandle=0x03000800c8c31e75b259fa002fa200000000000000000000
09/05/2013 16:30:55,spid17s,Unknown,executionStack
09/05/2013 16:30:55,spid17s,Unknown,process id=processf34868 taskpriority=0 logused=3000 waitresource=OBJECT: 8:428945000:0  waittime=171 ownerId=26880196295 transactionname=user_transaction lasttranstarted=2013-09-05T16:30:55.710 XDES=0x1c7b18370 lockMode=X schedulerid=2 kpid=13932 status=suspended spid=93 sbid=0 ecid=0 priority=0 transcount=2 lastbatchstarted=2013-09-05T16:30:55.770 lastbatchcompleted=2013-09-05T16:30:55.710 clientapp=.Net SqlClient Data Provider hostname=MYDB hostpid=13216 loginname=client isolationlevel=serializable (4) xactid=26880196295 currentdb=8 lockTimeout=4294967295 clientoption1=673316896 clientoption2=128056
09/05/2013 16:30:55,spid17s,Unknown,process-list
09/05/2013 16:30:55,spid17s,Unknown,deadlock victim=processf35c18
09/05/2013 16:30:55,spid17s,Unknown,deadlock-list
09/05/2013 16:30:08,spid71,Unknown,DBCC TRACEON 1222<c/> server process ID (SPID) 71. This is an informational message only; no user action is required.

需要注意的一件事是在插入相關表時有一個觸發器。觸發器是確定整個記錄的狀態程式碼所必需的,這可能取決於同一表中的兄弟記錄。長期以來,我們認為觸發器是死鎖的原因,因此我們向觸發器添加了越來越激進的鎖定提示,最終在目前設置中達到高潮,我們在關鍵部分之前對相關表執行 TABLOCKX、HOLDLOCK。我們認為這將通過有效地序列化所有插入來完全防止死鎖,但會犧牲一些性能。但似乎並非如此。據我了解,在我們的獨占表鎖之前的其他東西必須已經持有共享或更新鎖。但是什麼?

其他可能對您有幫助的資訊:DomainTransferRANT 表被大量索引。它的主鍵是一個非集群 GUID。另一個重要的 INT 列上有一個聚集索引。還有其他 7 個非聚集索引。最後,還有幾個外鍵約束。

我沒有機會重新執行跟踪並擷取格式更好的 XDL 文件,但與此同時,我能夠通過從觸發器中刪除所有隔離級別和鎖定提示並修改應用程式碼來解決問題來自可序列化事務的 INSERT 本身

該應用程序使用 LINQ to SQL,因此 INSERT 語句是動態生成的,並在預設的 READ COMMITTED 隔離級別下執行。從死鎖跟踪中,我可以看出 INSERT 佔用的鎖與觸發器佔用的鎖相衝突。即使從觸發器中刪除了所有隔離和鎖定提示,死鎖仍然發生,所以我推測觸發器的 UPDATE 語句導致的鎖定比 INSERT 語句更強。當競爭的同時事務試圖做同樣的事情時,就會發生死鎖。

我們之前嘗試的解決方案絕對是在正確的軌道上:在觸發器中使用更高的隔離級別和更強的鎖定。問題是在觸發器被呼叫之前,由 LINQ to SQL 生成的 INSERT 語句已經佔用了較弱的鎖。解決方案是使用 SEALIZABLE 隔離級別在我們的 .NET 程式碼中顯式啟動一個事務,在該事務中執行插入,讓觸發器發生在同一個事務中(不改變隔離級別或鎖定提示),然後送出並處置該事務. 當我這樣做時,我不再能夠重現該問題(我什至將其放回舊方式並驗證問題是否返回。)這是新程式碼:

           ' Do the insert in a serializable transaction to prevent simultaneous inserts from deadlocking
           ' due to the trigger, which reaches out to other rows.
           Me.packageContext.Connection.Open()
           Try
               Using oTransaction = Me.packageContext.Connection.BeginTransaction(IsolationLevel.Serializable)
                   Me.packageContext.Transaction = oTransaction
                   Try
                       Me.packageContext.SubmitChanges()
                       oTransaction.Commit()
                   Catch ex As Exception
                       oTransaction.Rollback()
                       Throw
                   End Try
               End Using
           Finally
               ' We need to close the connection and null out the transaction to allow
               ' subsequent uses of the same DataContext to work properly outside this transaction.
               Me.packageContext.Connection.Close()
               Me.packageContext.Transaction = Nothing
           End Try

通常我們會避免從客戶端程式碼中打開事務,因為如果程式碼執行遠端 API 呼叫,它們可能會執行很長時間,從而導致過度阻塞。但是在這種情況下,我們隔離了一小段程式碼,並且我們圍繞它進行了強大的錯誤/回滾處理。

另一種解決方案是實現在顯式 SERIALIZABLE 事務中執行 INSERT 的儲存過程,然後更改 DataContext 以呼叫儲存過程。但是,該解決方案的可維護性較差,因為它需要在每次架構更改時更新儲存過程和 DataContext 定義。

最後,另一種可能的替代解決方案是詳細檢查死鎖日誌並找到一種方法來調整我們的索引或鍵以避免死鎖。但是,這更加困難,並且在我們的設計中可能無法實現。

我認為上面的解決方案對於未來的開發人員來說是最簡單和最容易理解的。

引用自:https://dba.stackexchange.com/questions/49359