Sql-Server

SQL Server 檢測到基於邏輯一致性的 I/O 錯誤:RESTORE 上的 pageid 不正確

  • August 31, 2020

將我們的數據庫恢復到新的伺服器硬體失敗並出現頁面錯誤。

Message
SQL Server detected a logical consistency-based I/O error: 
incorrect pageid (expected 49:8125916; actual 49:29097436).
It occurred during a read of page (49:8125916) in database ID 7 at
offset 0x00000f7fbb8000 in file x:\Databases\Data07\DataWarehouse_OperationalData_15.ndf'.
Additional messages in the SQL Server error log or system event
log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately.
Complete a full database consistency check (DBCC CHECKDB). 
This error can be caused by many factors; 
for more information, see SQL Server Books Online.

在這個 70 TB 數據庫和 6 年曆史的伺服器上,DBCC CheckDB 將花費我們大約一周的時間。

有沒有可能是我的備份損壞了?

還是說新伺服器有問題?

還是問題出在舊伺服器上的生產數據庫?

這是 SQL Server 2016 SP1 CU1,page_verify_option_desc 是 CHECKSUM。

這是我從新伺服器上的錯誤日誌中可以看到的:

Starting up database 'DataWarehouse'.
The database 'DataWarehouse' is marked RESTORING and is in a state that does not allow recovery to be run.
Error: 824, Severity: 16, State: 2.
SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 49:8125916; actual 49:29097436). It occurred during a read of page (49:8125916) in database ID 7 at offset 0x00000f7fbb8000 in file 'S:\MSSQL\DSA\Databases\DataWarehouse\Data07\DataWarehouse_OperationalData_15.ndf'.  Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
Error: 18456, Severity: 14, State: 38.

當我在舊伺服器上執行它時:

DBCC TRACEON (3604);
DBCC PAGE ('datawarehouse', 49, 8125916, 0);

我明白了

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

PAGE: (49:8125916)


BUFFER:


BUF @0x000000802C5E9300

bpage = 0x0000004DF03E8000          bhash = 0x0000000000000000          bpageno = (49:8125916)
bdbid = 5                           breferences = 1                     bcputicks = 0
bsampleCount = 0                    bUse1 = 5844                        bstat = 0x9
blog = 0x15ab215a                   bnext = 0x0000000000000000          bDirtyContext = 0x0000000000000000
bstat2 = 0x0                        

PAGE HEADER:


Page @0x0000004DF03E8000

m_pageId = (49:8125916)             m_headerVersion = 1                 m_type = 3
m_typeFlagBits = 0x0                m_level = 0                         m_flagBits = 0xa200
m_objId (AllocUnitId.idObj) = 10814647                                   m_indexId (AllocUnitId.idInd) = 256
Metadata: AllocUnitId = 72058302786633728                                
Metadata: PartitionId = 72058370492596224                                Metadata: IndexId = 1
Metadata: ObjectId = 1916450641     m_prevPage = (0:0)                  m_nextPage = (0:0)
pminlen = 0                         m_slotCnt = 2                       m_freeCnt = 2452
m_freeData = 5764                   m_reservedCnt = 0                   m_lsn = (2455108:11140830:20)
m_xactReserved = 0                  m_xdesId = (28:447547798)           m_ghostRecCnt = 0
m_tornBits = 31453216               DB Frag ID = 1                      

Allocation Status

GAM (49:7668480) = ALLOCATED        SGAM (49:7668481) = NOT ALLOCATED   
PFS (49:8120352) = 0x42 ALLOCATED  80_PCT_FULL                           DIFF (49:7668486) = CHANGED
ML (49:7668487) = NOT MIN_LOGGED    


DBCC execution completed. If DBCC printed error messages, contact your system administrator.

DBCC CHECKFILEGROUP 目前正在執行。完成沒有錯誤。

SeanGallardy 評論了二進制數。我花了很長時間才明白他的意思,但我終於在程序員模式的 Windows 計算器中輸入了這兩個數字,它顯示的內容如下: 紅色顯示差異

更新:我們進行了另一次備份並從舊伺服器恢復到新伺服器。這次我們收到了這樣的資訊:

Could not redo log record (2456609:4261461:64), for transaction ID (28:972770238), on page (48:211577379), allocation unit 72058359886184448, database 'DataWarehouse' (database ID 7). 
Page: LSN = (2456609:3279166:236), allocation unit = 72058351460417536, type = 20. Log: OpCode = 2, context 5, PrevPageLSN: (2456609:4250688:2). 
Restore from a backup of the database, or repair the database.

我試圖用Google搜尋這條消息,但我能找到的唯一建議是恢復數據庫(這是我想要做的)。

真正的問題是;接下來我該怎麼辦?

在進行還原之前,您沒有備份

真正的問題是;接下來我該怎麼辦?

這可能聽起來不像是 DBA.SE 問題的“答案”,但坦率地說,您在這裡唯一合理的選擇是向 Microsoft 提出支持案例並尋求幫助,或者聘請有信譽的專業人士(顧問或公司)修復 SQL Server 損壞問題的可靠記錄。如果您買不起這些選項,則意味著數據不值得。

引用自:https://dba.stackexchange.com/questions/164417