Sql-Server
表和數據集之間的合併/交叉 - 如何實現?
考慮下表:
Id Hash ----------- ---------------------------------- 1 0x31F777F0804D301936411E3ECD760859 2 0xD64A593F3E9ACC972158D522A4289EA0 (Id is an identity column)
在該表中,我想合併以下數據集:
Hash ---------------------------------- 0x31F777F0804D301936411E3ECD760859 0x31F777F0804D301936411E3ECD760859 0x0C5A65264F92A543E7AAA06375349C06 (Id is NOT present in the dataset)
合併規則如下:
- 如果表中不存在雜湊,則將其插入表中;
- 如果數據集中不存在雜湊,則將其從表中刪除;
- 如果雜湊確實存在於兩邊,並且表中有 X 個實例,源中有 Y 個實例,則應將 (YX) 個實例插入到表中。
合併的結果應該使表格看起來像這樣:
Id Hash ----------- ---------------------------------- 1 0x31F777F0804D301936411E3ECD760859 3 0x31F777F0804D301936411E3ECD760859 4 0x0C5A65264F92A543E7AAA06375349C06
編寫查詢以實現此操作的最有效方法是什麼?僅供參考,為簡潔起見,省略了其他列。
使用樣本數據:
DECLARE @T table ( Id integer IDENTITY NOT NULL PRIMARY KEY, [Hash] binary(16) NOT NULL INDEX h ); INSERT @T ([Hash]) VALUES (0x31F777F0804D301936411E3ECD760859); INSERT @T ([Hash]) VALUES (0xD64A593F3E9ACC972158D522A4289EA0); DECLARE @S table ( [Hash] binary(16) NOT NULL ); INSERT @S ([Hash]) VALUES (0x31F777F0804D301936411E3ECD760859), (0x31F777F0804D301936411E3ECD760859), (0x0C5A65264F92A543E7AAA06375349C06);
你可以把它寫成
MERGE
:WITH T AS ( SELECT T.[Hash], rn = ROW_NUMBER() OVER ( PARTITION BY T.[Hash] ORDER BY T.[Hash], T.Id) FROM @T AS T ), S AS ( SELECT DISTINCT S.[Hash], rn = ROW_NUMBER() OVER ( PARTITION BY S.[Hash] ORDER BY S.[Hash]) FROM @S AS S ) MERGE T USING S ON S.[Hash] = T.[Hash] AND S.rn = T.rn WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash]) WHEN NOT MATCHED BY SOURCE THEN DELETE;
但出於性能原因(以及一些錯誤),我通常會將其編寫為兩個單獨的語句:
WITH ToDelete AS ( SELECT T.* FROM @T AS T WHERE NOT EXISTS ( SELECT S.* FROM @S AS S WHERE S.[Hash] = T.[Hash] ) ) DELETE ToDelete;
WITH ToInsert AS ( SELECT S.[Hash], rn = ROW_NUMBER() OVER ( PARTITION BY S.[Hash] ORDER BY S.[Hash]) FROM @S AS S EXCEPT SELECT T.[Hash], rn = ROW_NUMBER() OVER ( PARTITION BY T.[Hash] ORDER BY T.[Hash], T.Id) FROM @T AS T ) INSERT @T ([Hash]) SELECT ToInsert.[Hash] FROM ToInsert;
您應該在 上的目標上有一個唯一索引
([Hash], [Id])
。您可能已經有了這個,或者等效地有一個索引[Hash]
和一個唯一的(可能是聚集的)索引[Id]
。可能還有其他問題,具體取決於為簡潔起見問題中省略了哪些其他內容。無論如何,它應該為您自己的解決方案提供幾個可能的起點。