Sql-Server

SQL 中的有限狀態機

  • February 29, 2016

我想就我遇到的問題提供一些意見。我們有一段程式碼在我們的儲存過程中重複,每次處理都需要相當長的時間,當結合起來,在一組數十萬個項目上的讀取次數達到數億次。基本上我們有物品,物品最多可以有12台機器,每台機器都有自己的狀態。

這些是(簡化的)表結構:

CREATE TABLE dbo.ItemMachineState
(
   [itemID] [int],
   [machineID] [int],
   [stateID] [int]
)

CREATE TABLE dbo.Transition
(
   [machineID] [int] NOT NULL,
   [eventID] [int] NOT NULL,
   [stateID] [int] NOT NULL,
   [nextStateID] [int] NOT NULL
)

發生的情況是,在處理過程中,我們創建了一個我們處理的#temp 表,它最終具有每個項目的 eventID。然後將該臨時表連接回 ItemState 和 Transition,如下所示:

UPDATE  dbo.ItemState
SET     stateID = tr.nextStateID
FROM    #temp t  
JOIN    dbo.ItemMachineState ist ON ist.itemID = t.itemID
JOIN    Transition tr ON tr.stateID = ist.stateID AND
                        tr.machineID = ist.machineID AND
                        tr.eventID = t.eventID

所以我們計算的 eventID 決定了給定項目的機器會發生什麼,這取決於它們各自處於什麼狀態。問題是一個事件可以在一個動作中操縱零個或多個機器狀態,如果該事件是相關的到狀態和機器的特定組合。

以下是這些狀態轉換之一的範例:

ItemID 3468489 首先在 ItemMachineState 中看起來像這樣…

itemID      machineID   stateID
----------- ----------- -----------
3468489     12          4
3468489     14          113
3468489     15          157
3468489     16          165
3468489     18          169
3468489     19          165
3468489     20          157
3468489     21          165
3468489     23          173
3468489     24          173
3468489     26          9
3468489     36          9

我們做了一些工作,最終得到了一個 #temp 表,其中包含一個 ItemID 和一個 EventID…

itemID      eventID
----------- -----------
3468489     64

然後我們將這兩個表加入到 Transition,對於這個特定的 eventID,它看起來像這樣:

machineID   eventID     stateID     nextStateID
----------- ----------- ----------- -----------
13          64          73          79
13          64          74          79
13          64          75          79
13          64          76          79
13          64          77          79
13          64          78          79
13          64          187         79
13          64          188         79
13          64          189         79
13          64          190         79
13          64          191         79
36          64          9           79
36          64          194         79
36          64          196         79
36          64          208         79
36          64          210         79
36          64          213         79
36          64          218         79
46          64          73          79
47          64          73          79
70          64          73          79
70          64          75          79
70          64          76          79
70          64          77          79
70          64          78          79

把它們放在一起:

SELECT  t.itemID, t.eventID, ist.machineID, ist.stateID, tr.nextStateID
FROM    #temp t  
JOIN    dbo.ItemMachineState ist ON ist.itemID = t.itemID
JOIN    Transition tr ON tr.stateID = ist.stateID AND
                        tr.machineID = ist.machineID AND
                        tr.eventID = t.eventID

itemID      eventID     machineID   stateID     nextStateID
----------- ----------- ----------- ----------- -----------
3468489     64          36          9           79

所以在這個特定的例子中,這個事件只與這個項目的一台機器相關。它的 stateID 在 machineID 36 上從 9 更新到 79,並且該 Item 的其他所有內容都保持不變。

我想就如何以不同的方式處理這個問題提出建議。我們不能離開表結構,但我們可以改變在轉換/事件期間將 stateID 設置為 nextStateID 的方式。正如您在上面看到的那樣,這是通過消除來實現的;我們需要目前狀態和機器來確定下一個狀態是什麼,對於那台機器,對於那個事件。在某些情況下,這不會更新任何東西,有時它會一次更新多台機器,我們喜歡這種能力。我不認為通過簡單地更改索引或添加查詢提示會找到解決此問題的最精簡的方法,我們需要一種新方法來限制讀取次數和處理時間,但提供相同的功能。


我想避免將索引等帶入這個討論,因為我必須使用真實的例子,這污染了我在這裡試圖問的本質,我更改了列和表的名稱以簡化我的問題。無論如何,給你:

查詢計劃 http://pastebin.com/xhPa4t8d,創建和索引腳本 http://pastebin.com/sp70QuEJ

請注意,在查詢計劃中,我們強制執行 INNER LOOP JOIN。如果留給簡單的 JOIN,則查詢的處理時間呈指數增長。


之前使用@wBob UNIQUE CLUSTERED 索引: 前

之後: 後

使用OPTION (MERGE JOIN, HASH JOIN)導致了這個執行計劃和結果:

選項後

將很快更新其他資訊

我會考慮不要一次更新所有行,而是通過批量機器執行,這樣每次更新的記錄量就會減少。您可以保留相同的程式碼,只需批處理即可。

assetID通過創建第二個帶有唯一聚集索引的臨時表並eventID刪除LOOP提示,我的測試設備的性能提高了大約 50% 。這不應該在語義上改變您的查詢結果。嘗試這個:

SELECT DISTINCT assetID, eventID
INTO #Event2
FROM #Event

CREATE UNIQUE CLUSTERED INDEX PK_temp_Event2 ON #Event2 ( assetID, eventID )

UPDATE ast
SET ast.stateID = st.nextStateID
FROM #Event2 AS e
   INNER JOIN EFT.AssetState AS ast
       ON ast.assetID = e.assetID
   INNER JOIN dbo.Transition AS st
       ON st.stateID = ast.stateID
           AND st.eventID = e.eventID
           AND st.machineID = ast.machineID

讓我知道你是怎麼辦的。如果可行,請考慮調整您原來的#Event 表——實際上不需要兩個臨時表,這只是為了提高性能。調運動。

如果它不起作用,我們可以考慮改進測試台以更準確地反映您的設置。我做了一些使用較少或沒有非聚集索引的實驗,也得到了一些不錯的結果,儘管顯然其他查詢可能會使用它們。

測試台

-- Secondary DDL provided;
USE tempdb
GO

IF NOT EXISTS ( SELECT * FROM sys.schemas WHERE name = 'EFT' )
   EXEC ('CREATE SCHEMA EFT')
GO

IF OBJECT_ID('[dbo].[Transition]') IS NOT NULL DROP TABLE [dbo].[Transition]
IF OBJECT_ID('[EFT].[AssetState]') IS NOT NULL DROP TABLE [EFT].[AssetState]
IF OBJECT_ID('[dbo].[Event]') IS NOT NULL DROP TABLE [dbo].[Event]
IF OBJECT_ID('[dbo].[State]') IS NOT NULL DROP TABLE [dbo].[State]
IF OBJECT_ID('[dbo].[Machine]') IS NOT NULL DROP TABLE [dbo].[Machine]
IF OBJECT_ID('#Event') IS NOT NULL DROP TABLE #Event
GO



-- #EFT.AssetState
CREATE TABLE [EFT].[AssetState](
   [assetID] [int] NOT NULL,
   [busDate] [datetime] NOT NULL,
   [machineID] [int] NOT NULL,
   [stateID] [int] NOT NULL,
CONSTRAINT [PK_AssetState] PRIMARY KEY CLUSTERED 
(
   [assetID] ASC,
   [busDate] ASC,
   [machineID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [IX_AssetState_assetID] ON [EFT].[AssetState]
(
   [assetID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [IX_AssetState_assetID_stateID] ON [EFT].[AssetState]
(
   [assetID] ASC,
   [stateID] ASC,
   [machineID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [IX_AssetState_machineID_stateID_assetID] ON [EFT].[AssetState]
(
   [machineID] ASC,
   [stateID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO


-- dbo.Transition
CREATE TABLE [dbo].[Transition](
   [transitionID] [int] IDENTITY(1,1) NOT NULL,
   [machineID] [int] NOT NULL,
   [category] [varchar](50) NOT NULL,
   [eventID] [int] NOT NULL,
   [stateID] [int] NOT NULL,
   [nextStateID] [int] NOT NULL,
CONSTRAINT [PK_Transition] PRIMARY KEY CLUSTERED 
(
   [transitionID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
CONSTRAINT [UK_Transition_machineID_stateID_eventID] UNIQUE NONCLUSTERED 
(
   [machineID] ASC,
   [stateID] ASC,
   [eventID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [UK_Transition_machineID_nextStateID_eventID] ON [dbo].[Transition]
(
   [machineID] ASC,
   [eventID] ASC,
   [stateID] ASC,
   [nextStateID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO



CREATE TABLE [dbo].[State](
   [stateID]   INT PRIMARY KEY
   )
GO

CREATE TABLE [dbo].[Event](
   [eventID]   INT PRIMARY KEY
   )
GO

CREATE TABLE [dbo].[Machine](
   [machineID] INT PRIMARY KEY
   )
GO



ALTER TABLE [dbo].[Transition]  WITH CHECK ADD  CONSTRAINT [FK_Transition_NextState] FOREIGN KEY([nextStateID])
REFERENCES [dbo].[State] ([stateID])
GO

ALTER TABLE [dbo].[Transition] CHECK CONSTRAINT [FK_Transition_NextState]
GO

ALTER TABLE [dbo].[Transition]  WITH CHECK ADD  CONSTRAINT [FK_Transition_State] FOREIGN KEY([stateID])
REFERENCES [dbo].[State] ([stateID])
GO

ALTER TABLE [dbo].[Transition] CHECK CONSTRAINT [FK_Transition_State]
GO

ALTER TABLE [dbo].[Transition]  WITH CHECK ADD  CONSTRAINT [FK_Transition_StateEvent] FOREIGN KEY([eventID])
REFERENCES [dbo].[Event] ([eventID])
GO

ALTER TABLE [dbo].[Transition] CHECK CONSTRAINT [FK_Transition_StateEvent]
GO

ALTER TABLE [dbo].[Transition]  WITH CHECK ADD  CONSTRAINT [FK_Transition_StateMachine] FOREIGN KEY([machineID])
REFERENCES [dbo].[Machine] ([machineID])
GO

ALTER TABLE [dbo].[Transition] CHECK CONSTRAINT [FK_Transition_StateMachine]
GO

-- #Event
CREATE TABLE #Event
(
   assetID INT     ,
   busDate DATETIME,
   eventID INT     
)

CREATE CLUSTERED INDEX IX_Ev_assetID ON #Event ( assetID )
GO



-- Create dummy data
-- populate AssetState with 2,658,200 records
--  2,658,200
;WITH cte AS (
SELECT TOP 1000000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn
FROM master.sys.columns c1
   CROSS JOIN master.sys.columns c2
   CROSS JOIN master.sys.columns c3
)
INSERT INTO [EFT].[AssetState]( assetID, busDate, machineID, stateID )
SELECT 
   items.rn AS assetID,
   '1 Jan 2015' AS busDate,
   machines.rn AS machineID,
   items.rn % 7 AS stateID
FROM
   ( SELECT TOP 221520 * FROM cte ) items
   CROSS JOIN
   ( SELECT TOP (12) * FROM cte ) machines
GO


-- Get a random selection for temp table
INSERT INTO #Event ( assetID, busDate, eventID )
SELECT TOP (2128660) assets.assetID, assets.busDate, assets.assetID % 99 AS eventID
FROM ( SELECT DISTINCT assetID, busDate FROM [EFT].[AssetState] ) assets
   CROSS JOIN
   ( SELECT TOP (12) * FROM [EFT].[AssetState] ) machines
ORDER BY NEWID()
GO


-- Get selection for Transition table
INSERT INTO [dbo].[State] ( stateID )
SELECT assetID
FROM ( SELECT DISTINCT TOP 99 assetID FROM [EFT].[AssetState] ) m
GO

INSERT INTO [dbo].[Event] ( eventID )
SELECT assetID
FROM ( SELECT DISTINCT TOP 99 assetID FROM [EFT].[AssetState] ) m
GO

INSERT INTO [dbo].[Machine] ( machineID )
SELECT machineID
FROM ( SELECT DISTINCT machineID FROM [EFT].[AssetState] ) m
GO




INSERT INTO dbo.Transition ( machineID, category, eventID, stateID, nextStateID )
SELECT TOP (1214)
   m.machineID,
   CASE x.rn % 3 WHEN 0 THEN 'X' WHEN 1 THEN 'Y' WHEN 2 THEN 'Z' END category, 
   ( x.rn % 99 ) + 1 eventID,
   ( x.rn % 7 ) + 1 stateID,
   ( x.rn % 7 ) + 2 nextStateID
FROM ( SELECT DISTINCT machineID FROM [EFT].[AssetState] ) m
   CROSS JOIN
   ( SELECT TOP (102) ROW_NUMBER() OVER( ORDER BY ( SELECT NULL ) ) rn, * FROM [EFT].[AssetState] ) x
ORDER BY NEWID()
GO
--:exit



-- Original
DECLARE @startDate DATETIME = GETDATE()
BEGIN TRAN

UPDATE  EFT.AssetState
SET stateID = st.nextStateID
FROM    #Event AS e
   INNER LOOP JOIN
       EFT.AssetState AS ast
   ON ast.assetID = e.assetID
       INNER JOIN
       Transition AS st
   ON st.stateID = ast.stateID
       AND st.eventID = e.eventID
       AND st.machineID = ast.machineID;

SELECT @@rowcount r, DATEDIFF( s, @startDate, GETDATE() ) diff1

ROLLBACK TRAN
GO



-- Revised
DECLARE @startDate DATETIME = GETDATE()

IF OBJECT_ID('tempdb..#Event2') IS NOT NULL DROP TABLE #Event2

SELECT DISTINCT assetID, eventID
INTO #Event2
FROM #Event

CREATE UNIQUE CLUSTERED INDEX PK_temp_Event2 ON #Event2 ( assetID, eventID )

BEGIN TRAN

UPDATE ast
SET ast.stateID = st.nextStateID
FROM #Event2 AS e
   INNER JOIN EFT.AssetState AS ast
       ON ast.assetID = e.assetID
   INNER JOIN dbo.Transition AS st
       ON st.stateID = ast.stateID
           AND st.eventID = e.eventID
           AND st.machineID = ast.machineID

SELECT @@rowcount r, DATEDIFF( s, @startDate, GETDATE() ) diff2

ROLLBACK TRAN
GO

更新 1:您的意思是沒有讀取 2200 萬條記錄嗎?您沒有任何WHERE子句,因此您將進行掃描。您可能會尋找嵌套循環連接的外部表,但較小的表將正確地位於頂部。我很想嘗試OPTION ( MERGE JOIN, HASH JOIN )在這里基本上排除嵌套循環,看看你是怎麼做的。此方法還具有不強制執行連接順序的額外好處。我建議這樣做是為了收集資訊,不一定是生產的修復。關於改進測試台以更準確地反映您的設置的任何建議?

這兩個查詢需要多長時間?您可以嘗試通過Plan Explorer(免費版)之類的工具執行它們,因為我認為這會很有啟發性。我注意到我的測試台有一個錯誤,我在第二台台的時間中包含了索引創建時間,所以請排除這種情況。對於我的結果,我得到了 15 秒的原始查詢和 7 秒的修改查詢:

試驗台結果

更新 2:與 OP 合作刪除非聚集索引,刪除循環連接提示,並將唯一索引添加到臨時表以獲得 75+% 的改進。@PaulWhite 的出色輸入和功勞。

引用自:https://dba.stackexchange.com/questions/130566