Sql-Server
刪除重複行的最快方法是什麼?
我需要從一個大表中刪除重複的行。實現這一目標的最佳方法是什麼?
目前我使用這個算法:
declare @t table ([key] int ) insert into @t select 1 insert into @t select 1 insert into @t select 1 insert into @t select 2 insert into @t select 2 insert into @t select 3 insert into @t select 4 insert into @t select 4 insert into @t select 4 insert into @t select 4 insert into @t select 4 insert into @t select 5 insert into @t select 5 insert into @t select 5 insert into @t select 5 insert into @t select 5 insert into @t select 6 insert into @t select 6 insert into @t select 6 insert into @t select 7 insert into @t select 7 insert into @t select 8 insert into @t select 8 insert into @t select 9 insert into @t select 9 insert into @t select 9 insert into @t select 9 insert into @t select 9 select * from @t ; with cte as ( select * , row_number() over (partition by [Key] order by [Key]) as Picker from @t ) delete cte where Picker > 1 select * from @t
當我在我的系統上執行它時:
;WITH Customer AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY AccountCode ORDER BY AccountCode ) AS [Version] FROM Stage.Customer ) DELETE FROM Customer WHERE [Version] <> 1
我發現 <> 1 比 > 1 好。
我可以創建這個索引,目前不存在:
USE [BodenDWH] GO CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>] ON [Stage].[Customer] ([AccountCode]) INCLUDE ([ID]) GO
有沒有其他方法可以完成這項工作?
在這種情況下,這個表並不大——實時系統上大約有 500,000 條記錄。
刪除是 SSIS 包的一部分,它每天執行,每天刪除大約 10-15 條記錄。
數據的結構方式存在問題,我只需要為每個客戶提供一個 AccountCode,但可能存在重複,如果不刪除它們,它們會在稍後階段破壞包。
開發包的人不是我,我的範圍也不是重新設計任何東西。
我只是在尋找以最快的方式擺脫重複項的最佳方法,而不必參考索引創建或任何東西,只需 T-SQL 程式碼。
如果表很小並且要刪除的行數很小,則使用
;WITH Customer AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY AccountCode ORDER BY (select null) ) AS [Version] FROM dbo.Customer ) DELETE FROM Customer WHERE [Version] > 1;
注意:在上面的查詢中,您在視窗順序子句中使用了任意順序
ORDER BY (select null)
(從Itzik Ben-Gan 的 T-SQL 查詢書和@AaronBertrand 也引用了上面的內容)。如果表很大(例如 5M 記錄),那麼刪除少量行或塊將有助於不膨脹事務日誌,並防止鎖升級。
當且僅當 Transact-SQL 語句在表的單個引用上獲得至少 5000 個鎖時,才會發生鎖升級。
while 1=1 begin WITH Customer AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY AccountCode ORDER BY (select null) ) AS [Version] FROM dbo.Customer ) DELETE top(4000) -- choose a lower batch size than 5000 to prevent lock escalation FROM Customer WHERE [Version] > 1 if @@ROWCOUNT < 4000 BREAK ; end