Sql-Server

刪除重複行的最快方法是什麼?

  • March 25, 2016

我需要從一個大表中刪除重複的行。實現這一目標的最佳方法是什麼?

目前我使用這個算法:

declare @t table ([key] int  )

insert into @t select 1
insert into @t select 1
insert into @t select 1
insert into @t select 2
insert into @t select 2
insert into @t select 3
insert into @t select 4
insert into @t select 4
insert into @t select 4
insert into @t select 4
insert into @t select 4
insert into @t select 5
insert into @t select 5
insert into @t select 5
insert into @t select 5
insert into @t select 5
insert into @t select 6
insert into @t select 6
insert into @t select 6
insert into @t select 7
insert into @t select 7
insert into @t select 8
insert into @t select 8
insert into @t select 9
insert into @t select 9
insert into @t select 9
insert into @t select 9
insert into @t select 9


select * from @t

; with cte as (
   select *
       , row_number() over (partition by [Key] order by [Key]) as Picker
   from @t
   )
delete cte 
where Picker > 1

select * from @t

當我在我的系統上執行它時:

;WITH Customer AS
   (
   SELECT *, ROW_NUMBER() OVER (PARTITION BY AccountCode ORDER BY AccountCode ) AS [Version]
   FROM Stage.Customer
   )
   DELETE
   FROM    Customer
   WHERE   [Version] <> 1

在此處輸入圖像描述

我發現 <> 1 比 > 1 好。

我可以創建這個索引,目前不存在:

USE [BodenDWH]
GO
CREATE NONCLUSTERED INDEX [&lt;Name of Missing Index, sysname,&gt;]
ON [Stage].[Customer] ([AccountCode])
INCLUDE ([ID])
GO

在此處輸入圖像描述

有沒有其他方法可以完成這項工作?

在這種情況下,這個表並不大——實時系統上大約有 500,000 條記錄。

刪除是 SSIS 包的一部分,它每天執行,每天刪除大約 10-15 條記錄。

數據的結構方式存在問題,我只需要為每個客戶提供一個 AccountCode,但可能存在重複,如果不刪除它們,它們會在稍後階段破壞包。

開發包的人不是我,我的範圍也不是重新設計任何東西。

我只是在尋找以最快的方式擺脫重複項的最佳方法,而不必參考索引創建或任何東西,只需 T-SQL 程式碼。

如果表很小並且要刪除的行數很小,則使用

;WITH Customer AS
   (
   SELECT *, ROW_NUMBER() OVER (PARTITION BY AccountCode ORDER BY (select null) ) AS [Version]
   FROM dbo.Customer
   )
   DELETE
   FROM    Customer
   WHERE   [Version] &gt; 1;

注意:在上面的查詢中,您在視窗順序子句中使用了任意順序ORDER BY (select null) (從Itzik Ben-Gan 的 T-SQL 查詢書和@AaronBertrand 也引用了上面的內容)

如果表很大(例如 5M 記錄),那麼刪除少量行或塊將有助於不膨脹事務日誌,並防止鎖升級

當且僅當 Transact-SQL 語句在表的單個引用上獲得至少 5000 個鎖時,才會發生鎖升級。

while 1=1
begin
WITH Customer AS
   (
   SELECT *, ROW_NUMBER() OVER (PARTITION BY AccountCode ORDER BY (select null) ) AS [Version]
   FROM dbo.Customer
   )
   DELETE top(4000) -- choose a lower batch size than 5000 to prevent lock escalation 
   FROM    Customer
   WHERE   [Version] &gt; 1

   if @@ROWCOUNT &lt; 4000
   BREAK ;

end

引用自:https://dba.stackexchange.com/questions/115627