Mysql

刪除所有重複項

  • November 18, 2020

我正在嘗試刪除所有重複項,但僅保留單個記錄(較短的 ID)。以下查詢會刪除重複項,但需要進行大量迭代才能刪除所有副本並保留原始副本。

DELETE FROM emailTable WHERE id IN (
SELECT * FROM (
   SELECT id FROM emailTable GROUP BY email HAVING ( COUNT(email) > 1 )
) AS q
)

它的MySQL。

DDL

CREATE TABLE `emailTable` (
`id` mediumint(9) NOT NULL auto_increment,
`email` varchar(200) NOT NULL default '',
PRIMARY KEY  (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=298872 DEFAULT CHARSET=latin1

嘗試這個:

DELETE FROM emailTable WHERE NOT EXISTS (
SELECT * FROM (
   SELECT MIN(id) minID FROM emailTable    
   GROUP BY email HAVING COUNT(*) > 0
 ) AS q
 WHERE minID=id
)

以上內容適用於我對 50 封電子郵件的測試(5 封不同的電子郵件重複了 10 次)。

您可能需要在“電子郵件”列上添加索引:

ALTER TABLE emailTable ADD INDEX ind_email (email);

250,000 行可能有點慢。在一個有 150 萬行(正確索引)的表上,這對我來說很慢,這就是我想出這個策略的方式:

/* CREATE MEMORY TABLE TO HOUSE IDs of the MIN */
CREATE TABLE email_min (minID INT, PRIMARY KEY(minID)) ENGINE=Memory;

/* INSERT THE MINIMUM IDs */
INSERT INTO email_min SELECT id FROM email
   GROUP BY email HAVING MIN(id);

/* MAKE SURE YOU HAVE RIGHT INFO */
SELECT * FROM email 
WHERE NOT EXISTS (SELECT * FROM email_min WHERE minID=id)

/* DELETE FROM EMAIL */
DELETE FROM email 
WHERE NOT EXISTS (SELECT * FROM email_min WHERE minID=id)

/* IF ALL IS WELL, DROP MEMORY TABLE */
DROP TABLE email_min;

記憶體表的好處是使用了一個索引(minID 上的主鍵),它比普通臨時表加快了程序。

引用自:https://dba.stackexchange.com/questions/5859