SQL Server 2016 和 SQL Server 2019 之間的分佈式可用性組不播種
很長的第一篇文章的道歉:-)
我計劃在現有 SQL2016 AG 和新 SQL Server 2019 AG 之間實施分佈式可用性組,並在建構之前在 LAB 中驗證設計。據我所知,應該支持這個 Dist AG,但我無法讓它工作。
實驗室包括:
- LAB1SQL01 - Windows Server Core 2016(評估許可證),執行 SQL Server 2016 (13.0.5026.0)
- LAB1SQL02 - Windows Server Core 2016(評估許可證),執行 SQL Server 2016 (13.0.5026.0)
- LAB1SQL03 - Windows Server 2019(評估許可證),執行 SQL Server 2019 (15.0.4023.6)
- LAB1SQL04 - Windows Server 2019(評估許可證),執行 SQL Server 2019 (15.0.4023.6)
我在所有伺服器上使用相同的組託管服務帳戶
我已經部署了以下 AlwaysOn 可用性組:
- LAB1SQLAG1 - 伺服器:LAB1SQL01 & LAB1SQL02,監聽器:LAB1SQLAG1L1
- LAB1SQLAG3 - 伺服器:LAB1SQL03 & LAB1SQL04,監聽器:LAB1SQLAG3L1
-- Create endpoints -- Run on LAB1SQL01 USE [master] GO CREATE ENDPOINT [Endpoint_AG] STATE=STARTED AS TCP (LISTENER_PORT = 5022, LISTENER_IP = ALL) FOR DATA_MIRRORING (ROLE = ALL, AUTHENTICATION = WINDOWS NEGOTIATE , ENCRYPTION = REQUIRED ALGORITHM AES) GO -- Run onLAB1SQL02 USE [master] GO CREATE ENDPOINT [Endpoint_AG] STATE=STARTED AS TCP (LISTENER_PORT = 5022, LISTENER_IP = ALL) FOR DATA_MIRRORING (ROLE = ALL, AUTHENTICATION = WINDOWS NEGOTIATE , ENCRYPTION = REQUIRED ALGORITHM AES) GO -- Create login and grant the SQL Server service account CONNECT permissions to the endpoint -- Run on LAB1SQL01 USE master GO CREATE LOGIN [LAB1\svc.sql01.lab1$] FROM WINDOWS; GO GRANT CONNECT ON ENDPOINT::Endpoint_AG TO [LAB1\svc.sql01.lab1$]; GO -- Run on LAB1SQL02 USE master GO CREATE LOGIN [LAB1\svc.sql01.lab1$] FROM WINDOWS; GO GRANT CONNECT ON ENDPOINT::Endpoint_AG TO [LAB1\svc.sql01.lab1$]; GO -- Create an Availability Group LAB1SQLAG1 -- Run on LAB1SQL01 CREATE AVAILABILITY GROUP [LAB1SQLAG1] FOR REPLICA ON N'LAB1SQL01' WITH ( ENDPOINT_URL = N'TCP://LAB1SQL01.lab1.local:5022', FAILOVER_MODE = MANUAL, AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, SEEDING_MODE = AUTOMATIC ), N'LAB1SQL02' WITH ( ENDPOINT_URL = N'TCP://LAB1SQL02.lab1.local:5022', FAILOVER_MODE = MANUAL, AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, SEEDING_MODE = AUTOMATIC ) LISTENER 'LAB1SQLAG1L1' ( WITH IP ( ('10.16.16.34','255.255.255.0') ) , PORT = 1433 ); GO -- Run on LAB1SQL02 ALTER AVAILABILITY GROUP [LAB1SQLAG1] JOIN --Allow the Availability Group to create databases on behalf of the primary replica ALTER AVAILABILITY GROUP [LAB1SQLAG1] GRANT CREATE ANY DATABASE GO --------------------------------------------------------------------------- -- Create endpoints -- Run on LAB1SQL03 USE [master] GO CREATE ENDPOINT [Endpoint_AG] STATE=STARTED AS TCP (LISTENER_PORT = 5022, LISTENER_IP = ALL) FOR DATA_MIRRORING (ROLE = ALL, AUTHENTICATION = WINDOWS NEGOTIATE , ENCRYPTION = REQUIRED ALGORITHM AES) GO -- Run on LAB1SQL04 USE [master] GO CREATE ENDPOINT [Endpoint_AG] STATE=STARTED AS TCP (LISTENER_PORT = 5022, LISTENER_IP = ALL) FOR DATA_MIRRORING (ROLE = ALL, AUTHENTICATION = WINDOWS NEGOTIATE , ENCRYPTION = REQUIRED ALGORITHM AES) GO -- Create login and grant the SQL Server service account CONNECT permissions to the endpoint -- Run on LAB1SQL03 USE master GO CREATE LOGIN [LAB1\svc.sql01.lab1$] FROM WINDOWS; GO GRANT CONNECT ON ENDPOINT::Endpoint_AG TO [LAB1\svc.sql01.lab1$]; GO -- Run on LAB1SQL04 USE master GO CREATE LOGIN [LAB1\svc.sql01.lab1$] FROM WINDOWS; GO GRANT CONNECT ON ENDPOINT::Endpoint_AG TO [LAB1\svc.sql01.lab1$]; GO -- Create an Availability Group LAB1SQLAG3 -- Run on LAB1SQL03 CREATE AVAILABILITY GROUP [LAB1SQLAG3] FOR REPLICA ON N'LAB1SQL03' WITH ( ENDPOINT_URL = N'TCP://LAB1SQL03.lab1.local:5022', FAILOVER_MODE = MANUAL, AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, SEEDING_MODE = AUTOMATIC ), N'LAB1SQL04' WITH ( ENDPOINT_URL = N'TCP://LAB1SQL04.lab1.local:5022', FAILOVER_MODE = MANUAL, AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, SEEDING_MODE = AUTOMATIC ) LISTENER 'LAB1SQLAG3L1' ( WITH IP ( ('10.16.16.54','255.255.255.0') ) , PORT = 1433 ); GO -- Run on LAB1SQL04 ALTER AVAILABILITY GROUP [LAB1SQLAG3] JOIN --Allow the Availability Group to create databases on behalf of the primary replica ALTER AVAILABILITY GROUP [LAB1SQLAG3] GRANT CREATE ANY DATABASE GO
然後,我通過在 LAB1SQL01(用於 AG LAB1SQLAG1)和 LAB1SQL03(用於 LAB1SQLAG3)上創建測試數據庫來測試 2 個 AG,並確認兩個數據庫都同步到各自的副本。然後我刪除了數據庫。
然後創建了以下分佈式可用性組:
- DistAG_AG1_AG3 - 從 LAB1SQLAG1 到 LAB1SQLAG3 的 Dist AG
--Run on LAB1SQL01 CREATE AVAILABILITY GROUP [DistAG_AG1_AG3] WITH (DISTRIBUTED) AVAILABILITY GROUP ON 'LAB1SQLAG1' WITH ( LISTENER_URL = 'TCP://LAB1SQLAG1L1.LAB1.LOCAL:5022', AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT, FAILOVER_MODE = MANUAL, SEEDING_MODE = AUTOMATIC ), 'LAB1SQLAG3' WITH ( LISTENER_URL = 'TCP://LAB1SQLAG3L1.LAB1.LOCAL:5022', AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT, FAILOVER_MODE = MANUAL, SEEDING_MODE = AUTOMATIC ); GO --Run on LAB1SQL03 ALTER AVAILABILITY GROUP [DistAG_AG1_AG3] JOIN AVAILABILITY GROUP ON 'LAB1SQLAG1' WITH ( LISTENER_URL = 'TCP://LAB1SQLAG1L1.LAB1.LOCAL:5022', AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT, FAILOVER_MODE = MANUAL, SEEDING_MODE = AUTOMATIC ), 'LAB1SQLAG3' WITH ( LISTENER_URL = 'TCP://LAB1SQLAG3L1.LAB1.LOCAL:5022', AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT, FAILOVER_MODE = MANUAL, SEEDING_MODE = AUTOMATIC ); GO
但是這次我在LAB1SQL01上創建了一個測試數據庫,然後添加到本地AG LAB1SQLAG1,輔助AG發生了奇怪的事情。
上的數據庫出現 LAB1SQL03 但仍處於“同步/恢復中”狀態,並且它不會作為數據庫出現在 LAB1SQL04 上(但確實出現在可用性數據庫下,旁邊有一個星號。
當我查看 LAB1SQL04 上的日誌時,絕對沒有任何內容。當我查看 LAB1SQL03 上的日誌時,我可以看到數據庫正在恢復,然後出現以下錯誤:
Always On: DebugTraceVarArgs AR '[HADR] [Primary] operation on replicas [9DAB7230-0E72-450E-ADFC-ECCDAE1349E7]->[0B7ABD01-D534-41AB-AFEB-91F6A3D89D46], database [SQL01TestDB1], remote endpoint [TCP://LAB1SQL04.lab1.local:5022], source operation [497E843D-F993-49CD-9816-C3800DCF18B4]: Seeding encountered a transient failure, state '20', retrying...' Always On: DebugTraceVarArgs AR '[HADR] [Primary] operation on replicas [9DAB7230-0E72-450E-ADFC-ECCDAE1349E7]->[0B7ABD01-D534-41AB-AFEB-91F6A3D89D46], database [SQL01TestDB1], remote endpoint [TCP://LAB1SQL04.lab1.local:5022], source operation [7781AE86-A91F-4523-9F3C-32D5A5201AE8]: Transitioning from [PENDING] to [CHECK_IF_SEEDING_NEEDED].' Always On: DebugTraceVarArgs AR '[HADR] [Primary] operation on replicas [9DAB7230-0E72-450E-ADFC-ECCDAE1349E7]->[0B7ABD01-D534-41AB-AFEB-91F6A3D89D46], database [SQL01TestDB1], remote endpoint [TCP://LAB1SQL04.lab1.local:5022], source operation [7781AE86-A91F-4523-9F3C-32D5A5201AE8]: Seeding task failed with result 0x80040e19.' Always On: DebugTraceVarArgs AR 'Seeding is canceled with cancelReason = 20' Always On: DebugTraceVarArgs AR '[HADR] [Primary] operation on replicas [9DAB7230-0E72-450E-ADFC-ECCDAE1349E7]->[0B7ABD01-D534-41AB-AFEB-91F6A3D89D46], database [SQL01TestDB1], remote endpoint [TCP://LAB1SQL04.lab1.local:5022], source operation [7781AE86-A91F-4523-9F3C-32D5A5201AE8]: Transitioning from [CHECK_IF_SEEDING_NEEDED] to [FAILED].' Always On: DebugTraceVarArgs AR '[HADR] [Primary] operation on replicas [9DAB7230-0E72-450E-ADFC-ECCDAE1349E7]->[0B7ABD01-D534-41AB-AFEB-91F6A3D89D46], database [SQL01TestDB1], remote endpoint [TCP://LAB1SQL04.lab1.local:5022], source operation [7781AE86-A91F-4523-9F3C-32D5A5201AE8]: Seeding encountered a transient failure, state '20', maximum retries exceeded'
我已經測試了上述配置 SQL 2016 AG 到 SQL 2016 AG,它執行良好。所以我不認為這是我的配置。所有伺服器都使用相同的 configfile.ini 建構並具有相同的磁碟佈局。我還嘗試在所有伺服器上禁用防火牆。
作業系統都是評估許可證。我從評估 SQL 許可證開始,認為這可能會導致問題,因此從工作中獲取企業副本進行測試,但它沒有幫助。
關於我需要做什麼才能使其正常工作的任何建議?我還沒有找到其他人使用我遇到的錯誤程式碼。
提前感謝所有閱讀到底部的人:-)
上的數據庫出現 LAB1SQL03 但仍處於“同步/恢復中”狀態,並且它不會作為數據庫出現在 LAB1SQL04 上(但確實出現在可用性數據庫下,旁邊有一個星號。
這基本上是設計使然,儘管感覺並非如此。讓我解釋。2016 上的數據庫是 2016 格式,它是線上可用的,它可以播種到普通 AG 中的所有輔助副本,也可以播種到轉發器,因為基礎數據庫來自主副本。當涉及到分佈式 AG 的轉發器端時,數據庫不可用,它正在等待數據庫升級到 2019,它不可讀或無法訪問,並且轉發端的輔助副本從轉發器獲取其數據庫副本,而不是global primary,不可用,因為需要先升級。
您的選擇是暫時在轉發器端手動播種輔助副本。
我做了從 SQL2016 AG 到 SQL2019 AG 的相同分佈式 AG 配置。我第一次使用 Active Directory 連接,我很不走運,雖然 sql 語句沒有錯誤,但沒有建立到輔助的連接。然後我用證書的方式連接了primary和forwarder,這次就好了。我可以看到連接已建立。我還注意到,從輔助分佈式副本中,數據庫狀態為 Synchronized/In Recovery。並且無法訪問數據庫。我認為這是從 sql2016 到 Sql2019 實現 DAG 時的正常行為。然後我做了failover測試,DAG成功failover到forwarder,從sql2019的secondary在failover後自動同步。所以我看到可以配置從 SQL2016 到 SQL 2019 的 DAG。