Sql-Server

表和數據集之間的合併/交叉 - 如何實現?

  • January 31, 2019

考慮下表:

Id          Hash
----------- ----------------------------------
1           0x31F777F0804D301936411E3ECD760859
2           0xD64A593F3E9ACC972158D522A4289EA0

(Id is an identity column)

在該表中,我想合併以下數據集:

Hash
----------------------------------
0x31F777F0804D301936411E3ECD760859
0x31F777F0804D301936411E3ECD760859
0x0C5A65264F92A543E7AAA06375349C06

(Id is NOT present in the dataset)

合併規則如下:

  • 如果表中不存在雜湊,則將其插入表中;
  • 如果數據集中不存在雜湊,則將其從表中刪除;
  • 如果雜湊確實存在於兩邊,並且表中有 X 個實例,源中有 Y 個實例,則應將 (YX) 個實例插入到表中。

合併的結果應該使表格看起來像這樣:

Id          Hash
----------- ----------------------------------
1           0x31F777F0804D301936411E3ECD760859
3           0x31F777F0804D301936411E3ECD760859
4           0x0C5A65264F92A543E7AAA06375349C06

編寫查詢以實現此操作的最有效方法是什麼?僅供參考,為簡潔起見,省略了其他列。

使用樣本數據:

DECLARE @T table
(
   Id integer IDENTITY NOT NULL PRIMARY KEY, 
   [Hash] binary(16) NOT NULL INDEX h
);

INSERT @T ([Hash]) VALUES (0x31F777F0804D301936411E3ECD760859);
INSERT @T ([Hash]) VALUES (0xD64A593F3E9ACC972158D522A4289EA0);

DECLARE @S table 
(
   [Hash] binary(16) NOT NULL
);

INSERT @S
   ([Hash])
VALUES
   (0x31F777F0804D301936411E3ECD760859),
   (0x31F777F0804D301936411E3ECD760859),
   (0x0C5A65264F92A543E7AAA06375349C06);

你可以把它寫成MERGE

WITH
   T AS
   (
       SELECT
           T.[Hash], 
           rn = ROW_NUMBER() OVER (
               PARTITION BY T.[Hash] 
               ORDER BY T.[Hash], T.Id)
       FROM @T AS T
   ),
   S AS
   (
       SELECT DISTINCT
           S.[Hash],
           rn = ROW_NUMBER() OVER (
               PARTITION BY S.[Hash] 
               ORDER BY S.[Hash])
       FROM @S AS S
   )
MERGE T
USING S
   ON S.[Hash] = T.[Hash]
   AND S.rn = T.rn
WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])
WHEN NOT MATCHED BY SOURCE THEN DELETE;

db<>小提琴

但出於性能原因(以及一些錯誤),我通常會將其編寫為兩個單獨的語句:

WITH ToDelete AS
(
   SELECT
       T.*
   FROM @T AS T
   WHERE 
       NOT EXISTS 
       (
           SELECT
               S.* 
           FROM @S AS S 
           WHERE 
               S.[Hash] = T.[Hash]
       )
)
DELETE ToDelete;
WITH ToInsert AS
(
   SELECT
       S.[Hash], 
       rn = ROW_NUMBER() OVER (
           PARTITION BY S.[Hash] 
           ORDER BY S.[Hash])
   FROM @S AS S
   EXCEPT
   SELECT
       T.[Hash], 
       rn = ROW_NUMBER() OVER (
           PARTITION BY T.[Hash] 
           ORDER BY T.[Hash], T.Id)
   FROM @T AS T
)
INSERT @T
   ([Hash])
SELECT
   ToInsert.[Hash]
FROM ToInsert;

db<>小提琴

您應該在 上的目標上有一個唯一索引([Hash], [Id])。您可能已經有了這個,或者等效地有一個索引[Hash]和一個唯一的(可能是聚集的)索引[Id]

可能還有其他問題,具體取決於為簡潔起見問題中省略了哪些其他內容。無論如何,它應該為您自己的解決方案提供幾個可能的起點。

引用自:https://dba.stackexchange.com/questions/228529