審核哪些表最常加入
我正在嘗試建立一些與我們數據庫中哪些表最常連接在一起的指標。我怎樣才能做到這一點?
編輯:我應該解釋我的推理;我們正在轉向基於微服務的架構,我希望加入數據,以便為領域驅動設計提供最好的資訊。
查看哪些表需要最頻繁地連接將允許我設計某些數據層微服務,以便它們始終可以訪問最頻繁連接的表的簡單連接。
因此,如果 Table1 和 Table2 總是被加入,我可以評估這一點並讓它們由相同的服務控制。如果沒有,我將有不同的服務來控制它們。
我不知道您是否可以區分聯接與直接查詢,但您可以使用索引活動(順便說一下,SQL Server 的版本是什麼?)來了解您的“熱”表。這個查詢應該讓你開始,我用它來辨識嚴重碎片化或大量使用的索引,但它仍然應該讓你到達那裡。我認為它不能很好地與堆一起工作,但是如果沒有充分的理由,你真的不應該擁有很多(或任何)。
我主要是自己寫的,但當然是從別人的作品中抄襲和借來的。我要感謝所有曾經在網際網路上發布程式碼的人。
/** Author: Jonathan Fite Index Analyzer Requires CONTROL permission in the database, VIEW DATABASE STATE and VIEW SERVER STATE */ /** Initialize Variables. @DatabaseID - The database we want to examine, must be specified, but defaults to the current database... @IndexScanLevel - The level of detail you want to scan to. - LIMITED - This is the default or NULL value, only looks at the parent level of the index. - SAMPLED - Detailed look at ~1% of the pages in the index. (Heaps and indexes smaller than 10K Pages use DETAILED) - DETAILED - full scan of all pages, heavy IO @MinFragmentation - If you want to see only indexes fragmented to a certain level. @MinPageCount - To limit to indexes that are over a certain page count. Small indexes don't typically benefit from being defragmented. A good number to start with is 100 */ DECLARE @DatabaseID INT; SET @DatabaseID = (SELECT DB_ID()); DECLARE @IndexScanLevel NVARCHAR(10); SET @IndexScanLevel = 'SAMPLED'; DECLARE @MinFragmentation DECIMAL(4,2); SET @MinFragmentation = 0.0; DECLARE @MinPageCount INT; SET @MinPageCount = 100; /** Begin Work on the Query *********************************************************/ --Get Column List Information DECLARE @ColumnList AS TABLE ( object_id INT NOT NULL , index_id INT NOT NULL , IndexedColumns VARCHAR(8000) NULL , IncludedColumns VARCHAR(8000) NULL ); WITH CTE_ColumnList AS ( SELECT IC.object_id , IC.index_id, C.name , IC.key_ordinal AS Position , DisplayName = C.name + CASE WHEN IC.is_descending_key = 1 THEN ' DESC' ELSE '' END , IC.is_included_column FROM sys.index_columns IC INNER JOIN sys.columns C ON C.object_id = IC.object_id AND IC.column_id = C.column_id ) INSERT INTO @ColumnList ( object_id , index_id , IndexedColumns , IncludedColumns ) SELECT DISTINCT C.object_id , C.index_id , IndexedColumns = STUFF((SELECT ', ' + E.DisplayName FROM CTE_ColumnList E WHERE E.object_id = C.object_id AND E.index_id = C.index_id AND E.is_included_column = 0 ORDER BY E.Position FOR XML PATH('')), 1,1,'') , IncludedColumns = STUFF((SELECT ', ' + E.DisplayName FROM CTE_ColumnList E WHERE E.object_id = C.object_id AND E.index_id = C.index_id AND E.is_included_column = 1 ORDER BY E.Position FOR XML PATH('')), 1,1,'') FROM CTE_ColumnList C; SELECT DB_NAME(IPS.database_id) AS DatabaseName , SchemaName = S.name , ObjectName = O.name , O.type_desc AS ObjectType , I.name AS IndexName , IPS.index_type_desc , IPS.alloc_unit_type_desc , IPS.avg_fragmentation_in_percent , IPS.avg_page_space_used_in_percent , IPS.page_count , PageSize_MB = (IPS.page_count/128.0) , I.is_unique , I.fill_factor , I.is_disabled , I.has_filter , I.filter_definition , IUS.user_seeks , IUS.user_scans , IUS.user_lookups , IUS.user_updates , UsageActivityTotal = IUS.user_seeks + IUS.user_scans + IUS.user_lookups + IUS.user_updates , IOS.leaf_insert_count , IOS.leaf_update_count , IOS.leaf_delete_count , OperationalActivityTotal = IOS.leaf_insert_count + IOS.leaf_update_count + IOS.leaf_delete_count , LastAccessed = IUS_LastUsed.LastUsed , CL.IndexedColumns , CL.IncludedColumns FROM sys.dm_db_index_physical_stats(@DatabaseID, NULL, NULL, NULL, @IndexScanLevel) AS IPS INNER JOIN sys.objects O ON O.object_id = IPS.object_id INNER JOIN sys.schemas S ON S.schema_id = O.schema_id INNER JOIN sys.indexes I ON I.object_id = IPS.object_id AND I.index_id = IPS.index_id LEFT OUTER JOIN @ColumnList CL ON CL.object_id = IPS.object_id AND CL.index_id = IPS.index_id LEFT OUTER JOIN sys.dm_db_index_usage_stats IUS ON IUS.database_id = IPS.database_id AND IUS.object_id = IPS.object_id AND IUS.index_id = IPS.index_id LEFT OUTER JOIN sys.dm_db_index_operational_stats(@DatabaseID, NULL, NULL, NULL) IOS ON IOS.object_id = IPS.object_id AND IOS.index_id = IPS.index_id LEFT OUTER JOIN ( SELECT S.database_id , S.object_id , S.index_id , MAX(S.lastused) AS LastUsed FROM sys.dm_db_index_usage_stats UNPIVOT (lastused FOR nlastused IN (last_user_seek, last_user_scan, last_user_lookup, last_user_update)) AS S GROUP BY S.database_id , S.object_id , S.index_id ) IUS_LastUsed ON IUS_LastUsed.database_id = IPS.database_id AND IUS_LastUsed.object_id = IPS.object_id AND IUS_LastUsed.index_id = IPS.index_id WHERE ISNULL(O.is_ms_shipped, 0x0) = 0x0 --Exclude System Generated Objects AND ISNULL(I.is_disabled, 0x0) = 0x0 --Exclude Disabled Indexes AND ISNULL(IPS.page_count, 0) >= @MinPageCount --Only worry about indexes with worthwhile page counts AND ISNULL(IPS.avg_fragmentation_in_percent, 0.0) >= @MinFragmentation --Only display indexes that exceed fragmentation threshold ORDER BY IPS.avg_fragmentation_in_percent DESC;
這有點像兔子洞,並不是 100% 可靠,但首先遵循我最初的構想:
- 經常執行的查詢將被 sys.dm_exec_query_stats 擷取(儘管這會根據您的服務重啟頻率和計劃記憶體回收而有所不同)。
- 當您有查詢文本時,解析它以查找涉及的表是很困難的。詢問我在 Plan Explorer 中編寫標記化/匿名化的同事。:-)
- 如果 SQL Server 足夠現代,有時 sys.dm_exec_describe_first_result_set 會有所幫助。這完全取決於批處理的文本、所涉及的表的位置以及許多其他變數。
- 如果可以的話,我將稍微放寬您的要求,以說明您要審核的是涉及多個表的任何查詢(因此,除了連接之外,您還有連接聯合的查詢,使用交叉應用等)。
讓我們從我們都擁有的東西開始:msdb。如果我們執行以下兩個查詢:
SELECT backup_set_id FROM dbo.backupset; SELECT b.backup_set_id, m.media_set_id FROM dbo.backupset AS b INNER JOIN dbo.backupmediaset AS m ON b.media_set_id = m.media_set_id;
這是兩個非常簡單的表,一個來自單個查詢,另一個具有連接。如果我們將這些輸入元數據函式:
SELECT * FROM sys.dm_exec_describe_first_result_set (N'SELECT backup_set_id FROM dbo.backupset;', N'', 1); SELECT * FROM sys.dm_exec_describe_first_result_set (N'SELECT b.backup_set_id, m.media_set_id FROM dbo.backupset AS b INNER JOIN dbo.backupmediaset AS m ON b.media_set_id = m.media_set_id;', N'', 1);
我們可以看到,當我們向這個函式提供查詢文本時,我們可以確定至少由查詢輸出的列和表。(您應該開始看到這增加了一些限制 - 如果您連接兩個表並且只從其中一個輸出列,或者使用 EXISTS 之類的東西,該函式只會告訴您其中一個。)
因此,讓我們更進一步,從 query_stats DMV 中為我們的兩個查詢提取查詢文本,而不是對它們進行硬編碼(不過,我們將硬編碼一個過濾器以排除噪音)。
SELECT qs.plan_handle, qs.execution_count, q = SUBSTRING(st.[text],(qs.statement_start_offset + 2) / 2, (CASE WHEN qs.statement_end_offset = -1 THEN LEN(CONVERT(nvarchar(max), st.text)) * 2 ELSE qs.statement_end_offset + 2 END - qs.statement_start_offset) / 2) FROM sys.dm_exec_query_stats AS qs CROSS APPLY sys.dm_exec_sql_text(qs.plan_handle) AS st WHERE st.[text] LIKE N'%backup[_]' + N'set[_]id%' AND st.[text] NOT LIKE N'%[_]describe[_]%';
這應該返回兩行三列:plan_handle(一個大的 0x0… 值)、execution_count(可能為 1)和查詢文本。現在,下一步,讓我們將此查詢連接到函式:
;WITH src AS ( SELECT qs.plan_handle, qs.execution_count, q = SUBSTRING(st.[text],(qs.statement_start_offset + 2) / 2, (CASE WHEN qs.statement_end_offset = -1 THEN LEN(CONVERT(nvarchar(max), st.text)) * 2 ELSE qs.statement_end_offset + 2 END - qs.statement_start_offset) / 2) FROM sys.dm_exec_query_stats AS qs CROSS APPLY sys.dm_exec_sql_text(qs.plan_handle) AS st WHERE st.[text] LIKE N'%backup[_]' + N'set[_]id%' AND st.[text] NOT LIKE N'%[_]describe[_]%' ) SELECT * FROM src CROSS APPLY sys.dm_exec_describe_first_result_set(src.q,N'',1) AS f;
現在我們看到三行,一行來自非連接查詢,兩行來自連接查詢。由於我們正在尋找涉及多個表的查詢,並擱置邊緣情況,例如多個模式或數據庫中的相同表名、同義詞等。我們可以使用分組來僅顯示那些具有多個不同表名的查詢函式輸出:
;WITH src AS ( SELECT qs.plan_handle, qs.statement_start_offset, qs.execution_count, q = SUBSTRING(st.[text],(qs.statement_start_offset + 2) / 2, (CASE WHEN qs.statement_end_offset = -1 THEN LEN(CONVERT(nvarchar(max), st.text)) * 2 ELSE qs.statement_end_offset + 2 END - qs.statement_start_offset) / 2) FROM sys.dm_exec_query_stats AS qs CROSS APPLY sys.dm_exec_sql_text(qs.plan_handle) AS st WHERE st.[text] LIKE N'%backup[_]' + N'set[_]id%' AND st.[text] NOT LIKE N'%[_]describe[_]%' ), agg AS ( SELECT src.plan_handle, src.statement_start_offset, f.source_schema, f.source_table, tablecount = COUNT(*) OVER (PARTITION BY f.source_schema, f.source_table) FROM src CROSS APPLY sys.dm_exec_describe_first_result_set(src.q,N'',1) AS f GROUP BY src.plan_handle, src.statement_start_offset, f.source_schema, f.source_table ) SELECT src.q, src.execution_count, agg.source_schema, agg.source_table FROM src INNER JOIN agg ON src.plan_handle = agg.plan_handle AND src.statement_start_offset = agg.statement_start_offset WHERE agg.tablecount > 1 ORDER BY src.execution_count DESC, agg.source_schema, agg.source_table;
您必須經常執行它才能有意義(例如,使用的 DMV 在服務重新啟動時被清除)。
因此,雖然這是一個有趣的練習,但我認為 Jonathan 更正確——您應該擔心哪些表和索引最忙、未使用、寫入多於讀取等。加入本身並不是您需要的東西通常需要審核。恕我直言。