SQL 中的有限狀態機
我想就我遇到的問題提供一些意見。我們有一段程式碼在我們的儲存過程中重複,每次處理都需要相當長的時間,當結合起來,在一組數十萬個項目上的讀取次數達到數億次。基本上我們有物品,物品最多可以有12台機器,每台機器都有自己的狀態。
這些是(簡化的)表結構:
CREATE TABLE dbo.ItemMachineState ( [itemID] [int], [machineID] [int], [stateID] [int] ) CREATE TABLE dbo.Transition ( [machineID] [int] NOT NULL, [eventID] [int] NOT NULL, [stateID] [int] NOT NULL, [nextStateID] [int] NOT NULL )
發生的情況是,在處理過程中,我們創建了一個我們處理的#temp 表,它最終具有每個項目的 eventID。然後將該臨時表連接回 ItemState 和 Transition,如下所示:
UPDATE dbo.ItemState SET stateID = tr.nextStateID FROM #temp t JOIN dbo.ItemMachineState ist ON ist.itemID = t.itemID JOIN Transition tr ON tr.stateID = ist.stateID AND tr.machineID = ist.machineID AND tr.eventID = t.eventID
所以我們計算的 eventID 決定了給定項目的機器會發生什麼,這取決於它們各自處於什麼狀態。問題是一個事件可以在一個動作中操縱零個或多個機器狀態,如果該事件是相關的到狀態和機器的特定組合。
以下是這些狀態轉換之一的範例:
ItemID 3468489 首先在 ItemMachineState 中看起來像這樣…
itemID machineID stateID ----------- ----------- ----------- 3468489 12 4 3468489 14 113 3468489 15 157 3468489 16 165 3468489 18 169 3468489 19 165 3468489 20 157 3468489 21 165 3468489 23 173 3468489 24 173 3468489 26 9 3468489 36 9
我們做了一些工作,最終得到了一個 #temp 表,其中包含一個 ItemID 和一個 EventID…
itemID eventID ----------- ----------- 3468489 64
然後我們將這兩個表加入到 Transition,對於這個特定的 eventID,它看起來像這樣:
machineID eventID stateID nextStateID ----------- ----------- ----------- ----------- 13 64 73 79 13 64 74 79 13 64 75 79 13 64 76 79 13 64 77 79 13 64 78 79 13 64 187 79 13 64 188 79 13 64 189 79 13 64 190 79 13 64 191 79 36 64 9 79 36 64 194 79 36 64 196 79 36 64 208 79 36 64 210 79 36 64 213 79 36 64 218 79 46 64 73 79 47 64 73 79 70 64 73 79 70 64 75 79 70 64 76 79 70 64 77 79 70 64 78 79
把它們放在一起:
SELECT t.itemID, t.eventID, ist.machineID, ist.stateID, tr.nextStateID FROM #temp t JOIN dbo.ItemMachineState ist ON ist.itemID = t.itemID JOIN Transition tr ON tr.stateID = ist.stateID AND tr.machineID = ist.machineID AND tr.eventID = t.eventID itemID eventID machineID stateID nextStateID ----------- ----------- ----------- ----------- ----------- 3468489 64 36 9 79
所以在這個特定的例子中,這個事件只與這個項目的一台機器相關。它的 stateID 在 machineID 36 上從 9 更新到 79,並且該 Item 的其他所有內容都保持不變。
我想就如何以不同的方式處理這個問題提出建議。我們不能離開表結構,但我們可以改變在轉換/事件期間將 stateID 設置為 nextStateID 的方式。正如您在上面看到的那樣,這是通過消除來實現的;我們需要目前狀態和機器來確定下一個狀態是什麼,對於那台機器,對於那個事件。在某些情況下,這不會更新任何東西,有時它會一次更新多台機器,我們喜歡這種能力。我不認為通過簡單地更改索引或添加查詢提示會找到解決此問題的最精簡的方法,我們需要一種新方法來限制讀取次數和處理時間,但提供相同的功能。
我想避免將索引等帶入這個討論,因為我必須使用真實的例子,這污染了我在這裡試圖問的本質,我更改了列和表的名稱以簡化我的問題。無論如何,給你:
查詢計劃 http://pastebin.com/xhPa4t8d,創建和索引腳本 http://pastebin.com/sp70QuEJ
請注意,在查詢計劃中,我們強制執行 INNER LOOP JOIN。如果留給簡單的 JOIN,則查詢的處理時間呈指數增長。
之前使用@wBob UNIQUE CLUSTERED 索引:
使用
OPTION (MERGE JOIN, HASH JOIN)
導致了這個執行計劃和結果:將很快更新其他資訊
我會考慮不要一次更新所有行,而是通過批量機器執行,這樣每次更新的記錄量就會減少。您可以保留相同的程式碼,只需批處理即可。
assetID
通過創建第二個帶有唯一聚集索引的臨時表並eventID
刪除LOOP
提示,我的測試設備的性能提高了大約 50% 。這不應該在語義上改變您的查詢結果。嘗試這個:SELECT DISTINCT assetID, eventID INTO #Event2 FROM #Event CREATE UNIQUE CLUSTERED INDEX PK_temp_Event2 ON #Event2 ( assetID, eventID ) UPDATE ast SET ast.stateID = st.nextStateID FROM #Event2 AS e INNER JOIN EFT.AssetState AS ast ON ast.assetID = e.assetID INNER JOIN dbo.Transition AS st ON st.stateID = ast.stateID AND st.eventID = e.eventID AND st.machineID = ast.machineID
讓我知道你是怎麼辦的。如果可行,請考慮調整您原來的#Event 表——實際上不需要兩個臨時表,這只是為了提高性能。調運動。
如果它不起作用,我們可以考慮改進測試台以更準確地反映您的設置。我做了一些使用較少或沒有非聚集索引的實驗,也得到了一些不錯的結果,儘管顯然其他查詢可能會使用它們。
測試台
-- Secondary DDL provided; USE tempdb GO IF NOT EXISTS ( SELECT * FROM sys.schemas WHERE name = 'EFT' ) EXEC ('CREATE SCHEMA EFT') GO IF OBJECT_ID('[dbo].[Transition]') IS NOT NULL DROP TABLE [dbo].[Transition] IF OBJECT_ID('[EFT].[AssetState]') IS NOT NULL DROP TABLE [EFT].[AssetState] IF OBJECT_ID('[dbo].[Event]') IS NOT NULL DROP TABLE [dbo].[Event] IF OBJECT_ID('[dbo].[State]') IS NOT NULL DROP TABLE [dbo].[State] IF OBJECT_ID('[dbo].[Machine]') IS NOT NULL DROP TABLE [dbo].[Machine] IF OBJECT_ID('#Event') IS NOT NULL DROP TABLE #Event GO -- #EFT.AssetState CREATE TABLE [EFT].[AssetState]( [assetID] [int] NOT NULL, [busDate] [datetime] NOT NULL, [machineID] [int] NOT NULL, [stateID] [int] NOT NULL, CONSTRAINT [PK_AssetState] PRIMARY KEY CLUSTERED ( [assetID] ASC, [busDate] ASC, [machineID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO CREATE NONCLUSTERED INDEX [IX_AssetState_assetID] ON [EFT].[AssetState] ( [assetID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] GO CREATE NONCLUSTERED INDEX [IX_AssetState_assetID_stateID] ON [EFT].[AssetState] ( [assetID] ASC, [stateID] ASC, [machineID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] GO CREATE NONCLUSTERED INDEX [IX_AssetState_machineID_stateID_assetID] ON [EFT].[AssetState] ( [machineID] ASC, [stateID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] GO -- dbo.Transition CREATE TABLE [dbo].[Transition]( [transitionID] [int] IDENTITY(1,1) NOT NULL, [machineID] [int] NOT NULL, [category] [varchar](50) NOT NULL, [eventID] [int] NOT NULL, [stateID] [int] NOT NULL, [nextStateID] [int] NOT NULL, CONSTRAINT [PK_Transition] PRIMARY KEY CLUSTERED ( [transitionID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY], CONSTRAINT [UK_Transition_machineID_stateID_eventID] UNIQUE NONCLUSTERED ( [machineID] ASC, [stateID] ASC, [eventID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO CREATE NONCLUSTERED INDEX [UK_Transition_machineID_nextStateID_eventID] ON [dbo].[Transition] ( [machineID] ASC, [eventID] ASC, [stateID] ASC, [nextStateID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] GO CREATE TABLE [dbo].[State]( [stateID] INT PRIMARY KEY ) GO CREATE TABLE [dbo].[Event]( [eventID] INT PRIMARY KEY ) GO CREATE TABLE [dbo].[Machine]( [machineID] INT PRIMARY KEY ) GO ALTER TABLE [dbo].[Transition] WITH CHECK ADD CONSTRAINT [FK_Transition_NextState] FOREIGN KEY([nextStateID]) REFERENCES [dbo].[State] ([stateID]) GO ALTER TABLE [dbo].[Transition] CHECK CONSTRAINT [FK_Transition_NextState] GO ALTER TABLE [dbo].[Transition] WITH CHECK ADD CONSTRAINT [FK_Transition_State] FOREIGN KEY([stateID]) REFERENCES [dbo].[State] ([stateID]) GO ALTER TABLE [dbo].[Transition] CHECK CONSTRAINT [FK_Transition_State] GO ALTER TABLE [dbo].[Transition] WITH CHECK ADD CONSTRAINT [FK_Transition_StateEvent] FOREIGN KEY([eventID]) REFERENCES [dbo].[Event] ([eventID]) GO ALTER TABLE [dbo].[Transition] CHECK CONSTRAINT [FK_Transition_StateEvent] GO ALTER TABLE [dbo].[Transition] WITH CHECK ADD CONSTRAINT [FK_Transition_StateMachine] FOREIGN KEY([machineID]) REFERENCES [dbo].[Machine] ([machineID]) GO ALTER TABLE [dbo].[Transition] CHECK CONSTRAINT [FK_Transition_StateMachine] GO -- #Event CREATE TABLE #Event ( assetID INT , busDate DATETIME, eventID INT ) CREATE CLUSTERED INDEX IX_Ev_assetID ON #Event ( assetID ) GO -- Create dummy data -- populate AssetState with 2,658,200 records -- 2,658,200 ;WITH cte AS ( SELECT TOP 1000000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn FROM master.sys.columns c1 CROSS JOIN master.sys.columns c2 CROSS JOIN master.sys.columns c3 ) INSERT INTO [EFT].[AssetState]( assetID, busDate, machineID, stateID ) SELECT items.rn AS assetID, '1 Jan 2015' AS busDate, machines.rn AS machineID, items.rn % 7 AS stateID FROM ( SELECT TOP 221520 * FROM cte ) items CROSS JOIN ( SELECT TOP (12) * FROM cte ) machines GO -- Get a random selection for temp table INSERT INTO #Event ( assetID, busDate, eventID ) SELECT TOP (2128660) assets.assetID, assets.busDate, assets.assetID % 99 AS eventID FROM ( SELECT DISTINCT assetID, busDate FROM [EFT].[AssetState] ) assets CROSS JOIN ( SELECT TOP (12) * FROM [EFT].[AssetState] ) machines ORDER BY NEWID() GO -- Get selection for Transition table INSERT INTO [dbo].[State] ( stateID ) SELECT assetID FROM ( SELECT DISTINCT TOP 99 assetID FROM [EFT].[AssetState] ) m GO INSERT INTO [dbo].[Event] ( eventID ) SELECT assetID FROM ( SELECT DISTINCT TOP 99 assetID FROM [EFT].[AssetState] ) m GO INSERT INTO [dbo].[Machine] ( machineID ) SELECT machineID FROM ( SELECT DISTINCT machineID FROM [EFT].[AssetState] ) m GO INSERT INTO dbo.Transition ( machineID, category, eventID, stateID, nextStateID ) SELECT TOP (1214) m.machineID, CASE x.rn % 3 WHEN 0 THEN 'X' WHEN 1 THEN 'Y' WHEN 2 THEN 'Z' END category, ( x.rn % 99 ) + 1 eventID, ( x.rn % 7 ) + 1 stateID, ( x.rn % 7 ) + 2 nextStateID FROM ( SELECT DISTINCT machineID FROM [EFT].[AssetState] ) m CROSS JOIN ( SELECT TOP (102) ROW_NUMBER() OVER( ORDER BY ( SELECT NULL ) ) rn, * FROM [EFT].[AssetState] ) x ORDER BY NEWID() GO --:exit -- Original DECLARE @startDate DATETIME = GETDATE() BEGIN TRAN UPDATE EFT.AssetState SET stateID = st.nextStateID FROM #Event AS e INNER LOOP JOIN EFT.AssetState AS ast ON ast.assetID = e.assetID INNER JOIN Transition AS st ON st.stateID = ast.stateID AND st.eventID = e.eventID AND st.machineID = ast.machineID; SELECT @@rowcount r, DATEDIFF( s, @startDate, GETDATE() ) diff1 ROLLBACK TRAN GO -- Revised DECLARE @startDate DATETIME = GETDATE() IF OBJECT_ID('tempdb..#Event2') IS NOT NULL DROP TABLE #Event2 SELECT DISTINCT assetID, eventID INTO #Event2 FROM #Event CREATE UNIQUE CLUSTERED INDEX PK_temp_Event2 ON #Event2 ( assetID, eventID ) BEGIN TRAN UPDATE ast SET ast.stateID = st.nextStateID FROM #Event2 AS e INNER JOIN EFT.AssetState AS ast ON ast.assetID = e.assetID INNER JOIN dbo.Transition AS st ON st.stateID = ast.stateID AND st.eventID = e.eventID AND st.machineID = ast.machineID SELECT @@rowcount r, DATEDIFF( s, @startDate, GETDATE() ) diff2 ROLLBACK TRAN GO
更新 1:您的意思是沒有讀取 2200 萬條記錄嗎?您沒有任何
WHERE
子句,因此您將進行掃描。您可能會尋找嵌套循環連接的外部表,但較小的表將正確地位於頂部。我很想嘗試OPTION ( MERGE JOIN, HASH JOIN )
在這里基本上排除嵌套循環,看看你是怎麼做的。此方法還具有不強制執行連接順序的額外好處。我建議這樣做是為了收集資訊,不一定是生產的修復。關於改進測試台以更準確地反映您的設置的任何建議?這兩個查詢需要多長時間?您可以嘗試通過Plan Explorer(免費版)之類的工具執行它們,因為我認為這會很有啟發性。我注意到我的測試台有一個錯誤,我在第二台台的時間中包含了索引創建時間,所以請排除這種情況。對於我的結果,我得到了 15 秒的原始查詢和 7 秒的修改查詢:
更新 2:與 OP 合作刪除非聚集索引,刪除循環連接提示,並將唯一索引添加到臨時表以獲得 75+% 的改進。@PaulWhite 的出色輸入和功勞。