Sql-Server

審核哪些表最常加入

  • February 22, 2017

我正在嘗試建立一些與我們數據庫中哪些表最常連接在一起的指標。我怎樣才能做到這一點?

編輯:我應該解釋我的推理;我們正在轉向基於微服務的架構,我希望加入數據,以便為領域驅動設計提供最好的資訊。

查看哪些表需要最頻繁地連接將允許我設計某些數據層微服務,以便它們始終可以訪問最頻繁連接的表的簡單連接。

因此,如果 Table1 和 Table2 總是被加入,我可以評估這一點並讓它們由相同的服務控制。如果沒有,我將有不同的服務來控制它們。

我不知道您是否可以區分聯接與直接查詢,但您可以使用索引活動(順便說一下,SQL Server 的版本是什麼?)來了解您的“熱”表。這個查詢應該讓你開始,我用它來辨識嚴重碎片化或大量使用的索引,但它仍然應該讓你到達那裡。我認為它不能很好地與堆一起工作,但是如果沒有充分的理由,你真的不應該擁有很多(或任何)。

我主要是自己寫的,但當然是從別人的作品中抄襲和借來的。我要感謝所有曾經在網際網路上發布程式碼的人。

/**
   Author: Jonathan Fite

   Index Analyzer

   Requires CONTROL permission in the database, VIEW DATABASE STATE and VIEW SERVER STATE

*/

/** Initialize Variables. 

   @DatabaseID - The database we want to examine, must be specified, but defaults to the current database...
   @IndexScanLevel - The level of detail you want to scan to.
       - LIMITED - This is the default or NULL value, only looks at the parent level of the index.  
       - SAMPLED - Detailed look at ~1% of the pages in the index. (Heaps and indexes smaller than 10K Pages use DETAILED)
       - DETAILED - full scan of all pages, heavy IO
   @MinFragmentation - If you want to see only indexes fragmented to a certain level.
   @MinPageCount - To limit to indexes that are over a certain page count.  Small indexes don't typically benefit
       from being defragmented.  A good number to start with is 100

*/

DECLARE @DatabaseID INT;
SET @DatabaseID = (SELECT DB_ID());

DECLARE @IndexScanLevel NVARCHAR(10);
SET @IndexScanLevel = 'SAMPLED';

DECLARE @MinFragmentation DECIMAL(4,2);
SET @MinFragmentation = 0.0;

DECLARE @MinPageCount INT;
SET @MinPageCount = 100;

/** Begin Work on the Query *********************************************************/

--Get Column List Information
DECLARE @ColumnList AS TABLE
   (
   object_id INT NOT NULL
   , index_id  INT NOT NULL
   , IndexedColumns VARCHAR(8000) NULL
   , IncludedColumns VARCHAR(8000) NULL
   );

WITH CTE_ColumnList AS
(
   SELECT IC.object_id
       , IC.index_id, C.name
       , IC.key_ordinal AS Position
       , DisplayName = C.name + CASE WHEN IC.is_descending_key = 1 THEN ' DESC' ELSE '' END
       , IC.is_included_column
   FROM sys.index_columns IC
       INNER JOIN sys.columns C ON C.object_id = IC.object_id AND IC.column_id = C.column_id
)
INSERT INTO @ColumnList
       ( object_id ,
         index_id ,
         IndexedColumns ,
         IncludedColumns
       )
SELECT DISTINCT 
   C.object_id
   , C.index_id
   , IndexedColumns = STUFF((SELECT ', ' + E.DisplayName FROM CTE_ColumnList E WHERE E.object_id = C.object_id AND E.index_id = C.index_id AND E.is_included_column = 0 ORDER BY E.Position FOR XML PATH('')), 1,1,'')
   , IncludedColumns  = STUFF((SELECT ', ' + E.DisplayName FROM CTE_ColumnList E WHERE E.object_id = C.object_id AND E.index_id = C.index_id AND E.is_included_column = 1 ORDER BY E.Position FOR XML PATH('')), 1,1,'')
FROM CTE_ColumnList C;

SELECT DB_NAME(IPS.database_id) AS DatabaseName
   , SchemaName = S.name
   , ObjectName = O.name
   , O.type_desc AS ObjectType
   , I.name AS IndexName
   , IPS.index_type_desc
   , IPS.alloc_unit_type_desc
   , IPS.avg_fragmentation_in_percent
   , IPS.avg_page_space_used_in_percent 
   , IPS.page_count
   , PageSize_MB = (IPS.page_count/128.0)
   , I.is_unique
   , I.fill_factor
   , I.is_disabled
   , I.has_filter
   , I.filter_definition
   , IUS.user_seeks
   , IUS.user_scans
   , IUS.user_lookups
   , IUS.user_updates
   , UsageActivityTotal = IUS.user_seeks + IUS.user_scans + IUS.user_lookups + IUS.user_updates
   , IOS.leaf_insert_count
   , IOS.leaf_update_count
   , IOS.leaf_delete_count
   , OperationalActivityTotal = IOS.leaf_insert_count + IOS.leaf_update_count + IOS.leaf_delete_count
   , LastAccessed = IUS_LastUsed.LastUsed
   , CL.IndexedColumns
   , CL.IncludedColumns
FROM sys.dm_db_index_physical_stats(@DatabaseID, NULL, NULL, NULL, @IndexScanLevel) AS IPS
   INNER JOIN sys.objects O ON O.object_id = IPS.object_id
   INNER JOIN sys.schemas S ON S.schema_id = O.schema_id
   INNER JOIN sys.indexes I ON I.object_id = IPS.object_id AND I.index_id = IPS.index_id
   LEFT OUTER JOIN @ColumnList CL ON CL.object_id = IPS.object_id AND CL.index_id = IPS.index_id
   LEFT OUTER JOIN sys.dm_db_index_usage_stats IUS ON IUS.database_id = IPS.database_id AND IUS.object_id = IPS.object_id AND IUS.index_id = IPS.index_id
   LEFT OUTER JOIN sys.dm_db_index_operational_stats(@DatabaseID, NULL, NULL, NULL) IOS ON IOS.object_id = IPS.object_id AND IOS.index_id = IPS.index_id
   LEFT OUTER JOIN (
                   SELECT S.database_id
                       , S.object_id
                       , S.index_id
                       , MAX(S.lastused) AS LastUsed
                   FROM sys.dm_db_index_usage_stats UNPIVOT (lastused FOR nlastused IN (last_user_seek, last_user_scan, last_user_lookup, last_user_update)) AS S
                   GROUP BY S.database_id
                       , S.object_id
                       , S.index_id
                   ) IUS_LastUsed ON IUS_LastUsed.database_id = IPS.database_id AND IUS_LastUsed.object_id = IPS.object_id AND IUS_LastUsed.index_id = IPS.index_id
WHERE ISNULL(O.is_ms_shipped, 0x0) = 0x0 --Exclude System Generated Objects
   AND ISNULL(I.is_disabled, 0x0) = 0x0 --Exclude Disabled Indexes
   AND ISNULL(IPS.page_count, 0) >= @MinPageCount --Only worry about indexes with worthwhile page counts
   AND ISNULL(IPS.avg_fragmentation_in_percent, 0.0) >= @MinFragmentation --Only display indexes that exceed fragmentation threshold
ORDER BY IPS.avg_fragmentation_in_percent DESC;

這有點像兔子洞,並不是 100% 可靠,但首先遵循我最初的構想:

  • 經常執行的查詢將被 sys.dm_exec_query_stats 擷取(儘管這會根據您的服務重啟頻率和計劃記憶體回收而有所不同)。
  • 當您有查詢文本時,解析它以查找涉及的表是很困難的。詢問我在 Plan Explorer 中編寫標記化/匿名化的同事。:-)
  • 如果 SQL Server 足夠現代,有時 sys.dm_exec_describe_first_result_set 會有所幫助。這完全取決於批處理的文本、所涉及的表的位置以及許多其他變數。
  • 如果可以的話,我將稍微放寬您的要求,以說明您要審核的是涉及多個表的任何查詢(因此,除了連接之外,您還有連接聯合的查詢,使用交叉應用等)。

讓我們從我們都擁有的東西開始:msdb。如果我們執行以下兩個查詢:

SELECT backup_set_id FROM dbo.backupset;

SELECT b.backup_set_id, m.media_set_id
FROM dbo.backupset AS b 
INNER JOIN dbo.backupmediaset AS m
ON b.media_set_id = m.media_set_id;

這是兩個非常簡單的表,一個來自單個查詢,另一個具有連接。如果我們將這些輸入元數據函式:

SELECT * FROM sys.dm_exec_describe_first_result_set
 (N'SELECT backup_set_id FROM dbo.backupset;', N'', 1);

SELECT * FROM sys.dm_exec_describe_first_result_set
 (N'SELECT b.backup_set_id, m.media_set_id
    FROM dbo.backupset AS b 
    INNER JOIN dbo.backupmediaset AS m
    ON b.media_set_id = m.media_set_id;', N'', 1);

我們可以看到,當我們向這個函式提供查詢文本時,我們可以確定至少由查詢輸出的列和表。(您應該開始看到這增加了一些限制 - 如果您連接兩個表並且只從其中一個輸出列,或者使用 EXISTS 之類的東西,該函式只會告訴您其中一個。)

因此,讓我們更進一步,從 query_stats DMV 中為我們的兩個查詢提取查詢文本,而不是對它們進行硬編碼(不過,我們將硬編碼一個過濾器以排除噪音)。

SELECT qs.plan_handle, qs.execution_count, 
 q = SUBSTRING(st.[text],(qs.statement_start_offset + 2) / 2,
       (CASE 
           WHEN qs.statement_end_offset = -1 
            THEN LEN(CONVERT(nvarchar(max), st.text)) * 2
            ELSE qs.statement_end_offset + 2
           END - qs.statement_start_offset) / 2)
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.plan_handle) AS st
WHERE st.[text] LIKE N'%backup[_]' + N'set[_]id%'
 AND st.[text] NOT LIKE N'%[_]describe[_]%';

這應該返回兩行三列:plan_handle(一個大的 0x0… 值)、execution_count(可能為 1)和查詢文本。現在,下一步,讓我們將此查詢連接到函式:

;WITH src AS
(
   SELECT qs.plan_handle, qs.execution_count, 
       q =  SUBSTRING(st.[text],(qs.statement_start_offset + 2) / 2,
       (CASE 
           WHEN qs.statement_end_offset = -1 THEN
            LEN(CONVERT(nvarchar(max), st.text)) * 2
            ELSE qs.statement_end_offset + 2
           END - qs.statement_start_offset) / 2)
   FROM sys.dm_exec_query_stats AS qs
   CROSS APPLY sys.dm_exec_sql_text(qs.plan_handle) AS st
   WHERE st.[text] LIKE N'%backup[_]' + N'set[_]id%'
   AND st.[text] NOT LIKE N'%[_]describe[_]%'
)
SELECT * FROM src 
CROSS APPLY sys.dm_exec_describe_first_result_set(src.q,N'',1) AS f;

現在我們看到三行,一行來自非連接查詢,兩行來自連接查詢。由於我們正在尋找涉及多個表的查詢,並擱置邊緣情況,例如多個模式或數據庫中的相同表名、同義詞等。我們可以使用分組來僅顯示那些具有多個不同表名的查詢函式輸出:

;WITH src AS
(
   SELECT qs.plan_handle, qs.statement_start_offset, qs.execution_count, 
       q =  SUBSTRING(st.[text],(qs.statement_start_offset + 2) / 2,
       (CASE 
           WHEN qs.statement_end_offset = -1 THEN
            LEN(CONVERT(nvarchar(max), st.text)) * 2
            ELSE qs.statement_end_offset + 2
           END - qs.statement_start_offset) / 2)
   FROM sys.dm_exec_query_stats AS qs
   CROSS APPLY sys.dm_exec_sql_text(qs.plan_handle) AS st
   WHERE st.[text] LIKE N'%backup[_]' + N'set[_]id%'
   AND st.[text] NOT LIKE N'%[_]describe[_]%'
), 
agg AS 
(
   SELECT src.plan_handle, src.statement_start_offset, f.source_schema, f.source_table,
     tablecount = COUNT(*) OVER (PARTITION BY f.source_schema, f.source_table)
   FROM src 
   CROSS APPLY sys.dm_exec_describe_first_result_set(src.q,N'',1) AS f
   GROUP BY src.plan_handle, src.statement_start_offset, f.source_schema, f.source_table
)
SELECT src.q, src.execution_count, agg.source_schema, agg.source_table
FROM src INNER JOIN agg ON src.plan_handle = agg.plan_handle
 AND src.statement_start_offset = agg.statement_start_offset
 WHERE agg.tablecount > 1
 ORDER BY src.execution_count DESC, agg.source_schema, agg.source_table;

您必須經常執行它才能有意義(例如,使用的 DMV 在服務重新啟動時被清除)。

因此,雖然這是一個有趣的練習,但我認為 Jonathan 更正確——您應該擔心哪些表和索引最忙、未使用、寫入多於讀取等。加入本身並不是您需要的東西通常需要審核。恕我直言。

引用自:https://dba.stackexchange.com/questions/165077