Sql-Server

具有兩個節點的 SQL Server 2012 Always On 設置

  • October 14, 2015

我已經AlwaysOn設置了兩個數據節點和一個見證人。我遇到了我的主伺服器突然重新啟動的問題。

以下是我在這段時間內遇到的一些問題。

  1. 由於 PRIMARY Replica 突然重啟,DB 進入了 RECOVERY 模式。
  2. 數據庫需要大約 1 小時的時間來恢復。
  3. 在主副本的此恢復階段。輔助伺服器(由於 FAILOVER 現在是 PRIMARY)面臨超時並且正在觀察緩慢。

查看日誌,我可以看到有關數據庫發生前滾和回滾的日誌。但首先我想知道我恢復 DB 需要更長時間的原因是什麼?

另外,如果我在此設置中再添加一個輔助節點,想得到一個輸入,它會有效地幫助我嗎?

添加錯誤日誌:

2015-10-12 16:20:26.30 spid31s     The recovery LSN (6821:15912:1) was identified for the database with ID 12. This is an informational message only. No user action is required.
2015-10-12 16:23:56.81 spid44s     Recovery of database 'A' (11) is 0% complete (approximately 1168 seconds remain). Phase 1 of 3. This is an informational message only. No user action is required.
2015-10-12 16:23:57.28 spid44s     Recovery of database 'A' (11) is 0% complete (approximately 1091 seconds remain). Phase 1 of 3. This is an informational message only. No user action is required.
2015-10-12 16:23:57.28 spid44s     Recovery of database 'A' (11) is 0% complete (approximately 891 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2015-10-12 16:24:17.32 spid44s     Recovery of database 'A' (11) is 31% complete (approximately 44 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2015-10-12 16:24:50.53 spid6s      SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [Z:\SQLData\A\A.mdf] in database [A] (11).  The OS file handle is 0x0000000000000B9.  The offset of the latest long I/O is: 0x0000041135800
2015-10-12 16:24:55.46 spid44s     Recovery of database 'A' (11) is 53% complete (approximately 51 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2015-10-12 16:25:15.50 spid44s     Recovery of database 'A' (11) is 85% complete (approximately 13 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2015-10-12 16:25:49.69 spid44s     3033 transactions rolled forward in database 'A' (11:0). This is an informational message only. No user action is required.
2015-10-12 16:25:50.15 spid44s     Recovery completed for database A (database ID 11) in 290 second(s) (analysis 480 ms, redo 87801 ms, undo 0 ms.) This is an informational message only. No user action is required.
2015-10-12 16:25:52.35 spid44s     CHECKDB for database 'A' finished without errors on 2012-10-27 22:19:16.470 (local time). This is an informational message only; no user action is required.
2015-10-12 16:42:57.67 spid24s     AlwaysOn Availability Groups connection with primary database established for secondary database 'A' on the availability replica with Replica ID: {}. This is an informational message only. No user action is required.
2015-10-12 16:42:57.67 spid24s     The recovery LSN (216726:2384:1) was identified for the database with ID 11. This is an informational message only. No user action is required.
2015-10-12 16:42:57.91 spid24s     AlwaysOn Availability Groups connection with primary database established for secondary database 'A' on the availability replica with Replica ID: {}. This is an informational message only. No user action is required.
2015-10-12 16:42:57.91 spid24s     The recovery LSN (216726:2384:1) was identified for the database with ID 11. This is an informational message only. No user action is required.
2015-10-12 16:42:58.53 spid24s     Error: 35278, Severity: 17, State: 1.
2015-10-12 16:42:58.53 spid24s     Availability database 'A', which is in the secondary role, is being restarted to resynchronize with the current primary database. This is an informational message only. No user action is required.
2015-10-12 16:42:58.53 spid29s     Nonqualified transactions are being rolled back in database A for an AlwaysOn Availability Groups state change. Estimated rollback completion: 100%. This is an informational message only. No user action is required.
2015-10-12 16:42:58.53 spid38s     AlwaysOn Availability Groups connection with primary database terminated for secondary database 'A' on the availability replica with Replica ID: {}. This is an informational message only. No user action is required.
2015-10-12 16:42:58.79 spid29s     Starting up database 'A'.
2015-10-12 16:49:45.32 spid29s     Recovery of database 'A' (11) is 0% complete (approximately 1168 seconds remain). Phase 1 of 3. This is an informational message only. No user action is required.
2015-10-12 16:49:45.78 spid29s     Recovery of database 'A' (11) is 0% complete (approximately 1091 seconds remain). Phase 1 of 3. This is an informational message only. No user action is required.
2015-10-12 16:49:45.78 spid29s     Recovery of database 'A' (11) is 0% complete (approximately 891 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2015-10-12 16:50:05.80 spid29s     Recovery of database 'A' (11) is 35% complete (approximately 37 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2015-10-12 16:50:25.82 spid29s     Recovery of database 'A' (11) is 68% complete (approximately 18 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2015-10-12 16:50:44.66 spid29s     AlwaysOn Availability Groups connection with primary database established for secondary database 'A' on the availability replica with Replica ID: {}. This is an informational message only. No user action is required.
2015-10-12 16:50:44.66 spid29s     The recovery LSN (216726:2386:1) was identified for the database with ID 11. This is an informational message only. No user action is required.
2015-10-12 16:50:44.66 spid29s     Error: 35286, Severity: 16, State: 1.
2015-10-12 16:50:44.66 spid29s     Using the recovery LSN (216726:2384:1) stored in the metadata for the database with ID 11. This is an informational message only. No user action is required.
2015-10-12 16:53:12.38 spid29s     Error: 35278, Severity: 17, State: 1.
2015-10-12 16:53:12.38 spid29s     Availability database 'A', which is in the secondary role, is being restarted to resynchronize with the current primary database. This is an informational message only. No user action is required.
2015-10-12 16:53:12.38 spid29s     Nonqualified transactions are being rolled back in database A for an AlwaysOn Availability Groups state change. Estimated rollback completion: 100%. This is an informational message only. No user action is required.
2015-10-12 16:53:12.40 spid31s     AlwaysOn Availability Groups connection with primary database terminated for secondary database 'A' on the availability replica with Replica ID: {}. This is an informational message only. No user action is required.
2015-10-12 16:53:14.45 spid29s     Starting up database 'A'.
2015-10-12 17:05:44.30 spid29s     Recovery of database 'A' (11) is 0% complete (approximately 1168 seconds remain). Phase 1 of 3. This is an informational message only. No user action is required.
2015-10-12 17:05:44.76 spid29s     Recovery of database 'A' (11) is 0% complete (approximately 1091 seconds remain). Phase 1 of 3. This is an informational message only. No user action is required.
2015-10-12 17:05:44.76 spid29s     Recovery of database 'A' (11) is 0% complete (approximately 891 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2015-10-12 17:06:04.88 spid29s     Recovery of database 'A' (11) is 31% complete (approximately 45 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2015-10-12 17:06:24.92 spid29s     Recovery of database 'A' (11) is 65% complete (approximately 21 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2015-10-12 17:06:45.55 spid29s     AlwaysOn Availability Groups connection with primary database established for secondary database 'A' on the availability replica with Replica ID: {}. This is an informational message only. No user action is required.
2015-10-12 17:06:45.55 spid29s     The recovery LSN (216726:19027:80) was identified for the database with ID 11. This is an informational message only. No user action is required.
2015-10-12 17:06:48.97 spid29s     3034 transactions rolled forward in database 'A' (11:0). This is an informational message only. No user action is required.
2015-10-12 17:06:49.04 spid29s     Recovery completed for database A (database ID 11) in 710 second(s) (analysis 470 ms, redo 60412 ms, undo 0 ms.) This is an informational message only. No user action is required.
2015-10-12 17:06:49.07 spid20s     AlwaysOn Availability Groups connection with primary database established for secondary database 'A' on the availability replica with Replica ID: {}. This is an informational message only. No user action is required.
2015-10-12 17:06:49.08 spid20s     The recovery LSN (216726:19027:80) was identified for the database with ID 11. This is an informational message only. No user action is required.

問題可能在您看到的錯誤中:錯誤:35278,當這種情況發生時,您可能會觀察到數據庫長時間處於恢復狀態。

這可能是由多種原因引起的,通常是長時間執行的事務。

您遇到的超時可能是由於要恢復並返回數據庫的副本之間發送的流量引起的,但是,您確定超時不是由其他問題引起的嗎?

我很想知道您在此數據庫上的備份策略是什麼,以及在此故障轉移之前執行最後一次完整備份的時間。我最近在沒有執行備份的測試環境中遇到了這種情況。完整備份和後續日誌備份允許發生故障轉移而沒有問題,錯誤不存在並且恢復時間非常快。

引用自:https://dba.stackexchange.com/questions/117719