Sql-Server

TempDB 中的損壞分區如何導致 DBCC CHECKDB 報告沒有問題?

  • April 12, 2018

我們的一個 SQL Server 最近報告了以下錯誤:

DATE/TIME:  2/25/2013 9:15:14 PM

DESCRIPTION:    No catalog entry found for partition ID 9079262474267394048
    in database 2. The metadata is inconsistent. Run DBCC CHECKDB to check for
    a metadata corruption.

不到 15 分鐘後,我連接到伺服器並執行:

SELECT name
FROM sys.databases
WHERE database_id = 2;

哪個返回’tempdb’。然後我跑了:

DBCC CHECKDB ('tempdb') WITH NO_INFOMSGS, TABLERESULTS;

沒有返回任何結果,表明受影響的數據庫沒有問題。

數據庫中的損壞如何導致上述錯誤消息DBCC CHECKDB但未報告問題?我假設如果頁面校驗和計算失敗,導致頁面被標記為懷疑引用該頁面的任何對像都無法刪除,但我一定是錯的。

一旦頁面被標記為“可疑”,如何將其標記為不可疑、已修復或重複使用,或者任何DBCC CHECKDB不會報告相關頁面有任何問題的內容?


編輯:2013-02-27 13:24

只是為了好玩,我試圖在 TempDB 中重新創建損壞,假設 #temp 表是罪魁禍首。

但是,由於我無法SINGLE_USER在 TempDB 中設置該選項,我無法使用它DBCC WRITEPAGE來破壞頁面,因此我無法在 TempDB 中強制破壞。

而不是使用DBCC WRITEPAGE一個可以將數據庫設置為離線並使用十六進制編輯器來修改 db 文件中的隨機字節。當然,這在 TempDB 上也不起作用,因為數據庫引擎無法在 TempDB 離線的情況下執行。

如果停止實例,下次啟動時會自動重新創建 TempDB;因此,這也不會成功。

如果有人能想出一種方法來重現這種腐敗,我願意做進一步的研究。

為了測試損壞頁面無法修復的假設,DROP TABLE我創建了一個測試數據庫並使用以下腳本損壞頁面,然後嘗試刪除受影響的表。此處的結果是無法刪除該表;我不得不RESTORE DATABASE Testdb PAGE = ''...為了恢復受影響的頁面。我假設如果我對相關頁面的其他部分進行了更改,也許該頁面可以用DROP TABLE或者TRUNCATE table.

/* ********************************************* */
/* ********************************************* */
/* DO NOT USE THIS CODE ON A PRODUCTION SYSTEM!! */
/* ********************************************* */
/* ********************************************* */
USE Master;
GO
ALTER DATABASE test SET RECOVERY FULL;
BACKUP DATABASE Test 
   TO DISK = 'Test_db.bak'
   WITH FORMAT
       , INIT
       , NAME = 'Test Database backup'
       , SKIP
       , NOREWIND
       , NOUNLOAD
       , COMPRESSION
       , STATS = 1;
BACKUP LOG Test
   TO DISK = 'Test_log.bak'
   WITH FORMAT
       , INIT
       , NAME = 'Test Log backup'
       , SKIP
       , NOREWIND
       , NOUNLOAD
       , COMPRESSION
       , STATS = 1;
GO
ALTER DATABASE test SET SINGLE_USER;
GO
USE Test;
GO
IF EXISTS (SELECT name FROM sys.key_constraints WHERE name = 'PK_temp') 
   ALTER TABLE temp DROP CONSTRAINT PK_temp;
IF EXISTS (SELECT name FROM sys.default_constraints 
   WHERE name = 'DF_temp_testdata') 
   ALTER TABLE temp DROP CONSTRAINT DF_temp_testdata;
IF EXISTS (SELECT name FROM sys.tables WHERE name = 'temp') 
DROP TABLE temp;
GO
CREATE TABLE temp
(
   tempID INT NOT NULL CONSTRAINT PK_temp PRIMARY KEY CLUSTERED IDENTITY(1,1)
   , testdata uniqueidentifier CONSTRAINT DF_temp_testdata DEFAULT (NEWID())
);
GO

/* insert 10 rows into #temp */
INSERT INTO temp default values;
GO 10 

/* get some necessary parameters */
DECLARE @partitionID bigint;
DECLARE @dbid smallint;
DECLARE @tblid int;
DECLARE @indexid int;
DECLARE @pageid bigint;
DECLARE @offset INT;
DECLARE @fileid INT;

SELECT @dbid = db_id('Test')
   , @tblid = t.object_id
   , @partitionID = p.partition_id
   , @indexid = i.index_id
FROM sys.tables t
   INNER JOIN sys.partitions p ON t.object_id = p.object_id
   INNER JOIN sys.indexes i on t.object_id = i.object_id
WHERE t.name = 'temp';

SELECT TOP(1) @fileid = file_id 
FROM sys.database_files;

SELECT TOP(1) @pageid = allocated_page_page_id 
FROM sys.dm_db_database_page_allocations(@dbid, @tblid, null, @partitionID, 'LIMITED')
WHERE allocation_unit_type = 1;

/* get a random offset into the 8KB page */
SET @offset = FLOOR(rand() * 8192);
SELECT @offset;

/* 0x75 below is the letter 't' */
DBCC WRITEPAGE (@dbid, @fileid, @pageid, @offset, 1, 0x74, 1);


SELECT * FROM temp;

Msg 824, Level 24, State 2, Line 36
SQL Server detected a logical consistency-based I/O error: incorrect checksum
(expected: 0x298b2ce9; actual: 0x2ecb2ce9). It occurred during a read of page 
(1:1054) in database ID 7 at offset 0x0000000083c000 in file 'C:\SQLServer
\MSSQL11.MSSQLSERVER\MSSQL\DATA\Test.mdf'.  Additional messages in the SQL 
Server error log or system event log may provide more detail. This is a
severe error condition that threatens database integrity and must be
corrected immediately. Complete a full database consistency check
(DBCC CHECKDB). This error can be caused by many factors; for more
information, see SQL Server Books Online.

此時您與數據庫引擎斷開連接,因此重新連接以繼續。

USE Test;
DBCC CHECKDB WITH NO_INFOMSGS, TABLERESULTS;

這里報告了腐敗。

DROP TABLE temp;

Msg 824, Level 24, State 2, Line 36
SQL Server detected a logical consistency-based I/O error: incorrect checksum
(expected: 0x298b2ce9; actual: 0x2ecb2ce9). It occurred during a read of page 
(1:1054) in database ID 7 at offset 0x0000000083c000 in file 'C:\SQLServer
\MSSQL11.MSSQLSERVER\MSSQL\DATA\Test.mdf'.  Additional messages in the SQL 
Server error log or system event log may provide more detail. This is a
severe error condition that threatens database integrity and must be
corrected immediately. Complete a full database consistency check
(DBCC CHECKDB). This error can be caused by many factors; for more
information, see SQL Server Books Online.

此處報告腐敗,DROP TABLE失敗。

/* assuming ENTERPRISE or DEVELOPER edition of SQL Server,
   I can use PAGE='' to restore a single page from backup */
USE Master;
RESTORE DATABASE Test PAGE = '1:1054' FROM DISK = 'Test_db.bak'; 
BACKUP LOG Test TO DISK = 'Test_log_1.bak';

RESTORE LOG Test FROM DISK = 'Test_log.bak';
RESTORE LOG Test FROM DISK = 'Test_log_1.bak';

編輯#2,添加請求的@@VERSION 資訊。

SELECT @@VERSION;

回報:

Microsoft SQL Server 2012 (SP1) - 11.0.3000.0 (X64) 
   Oct 19 2012 13:38:57 
   Copyright (c) Microsoft Corporation
   Enterprise Evaluation Edition (64-bit) on Windows NT 6.2 <X64> 
       (Build 9200: )

我知道這是評估版,我們有企業版的密鑰,很快就會進行版本升級。

這是一個已知問題的修復:

修復:使用 SQL Server 2012 時出現“未找到數據庫中分區 ID 的目錄條目”錯誤

假設您在 Microsoft SQL Server 2012 中查詢 tempdb.sys.allocation_units 表。當您在查詢中使用 NOLOCK 提示或查詢處於 READ UNCOMMITED 事務隔離級別下時,您會收到以下間歇性 608 錯誤消息:

錯誤:608 嚴重性:16 狀態:1

沒有為數據庫中的分區找到目錄條目。元數據不一致。執行 DBCC CHECKDB 以檢查元數據損壞

注意 DBCC CHECKDB 命令不顯示任何數據庫損壞的跡象。

固定在:

您的版本 (11.0.3000.0) 是 SQL Server 2012 SP1 RTM

引用自:https://dba.stackexchange.com/questions/35595