Mysql
刪除所有重複項
我正在嘗試刪除所有重複項,但僅保留單個記錄(較短的 ID)。以下查詢會刪除重複項,但需要進行大量迭代才能刪除所有副本並保留原始副本。
DELETE FROM emailTable WHERE id IN ( SELECT * FROM ( SELECT id FROM emailTable GROUP BY email HAVING ( COUNT(email) > 1 ) ) AS q )
它的MySQL。
DDL
CREATE TABLE `emailTable` ( `id` mediumint(9) NOT NULL auto_increment, `email` varchar(200) NOT NULL default '', PRIMARY KEY (`id`) ) ENGINE=MyISAM AUTO_INCREMENT=298872 DEFAULT CHARSET=latin1
嘗試這個:
DELETE FROM emailTable WHERE NOT EXISTS ( SELECT * FROM ( SELECT MIN(id) minID FROM emailTable GROUP BY email HAVING COUNT(*) > 0 ) AS q WHERE minID=id )
以上內容適用於我對 50 封電子郵件的測試(5 封不同的電子郵件重複了 10 次)。
您可能需要在“電子郵件”列上添加索引:
ALTER TABLE emailTable ADD INDEX ind_email (email);
250,000 行可能有點慢。在一個有 150 萬行(正確索引)的表上,這對我來說很慢,這就是我想出這個策略的方式:
/* CREATE MEMORY TABLE TO HOUSE IDs of the MIN */ CREATE TABLE email_min (minID INT, PRIMARY KEY(minID)) ENGINE=Memory; /* INSERT THE MINIMUM IDs */ INSERT INTO email_min SELECT id FROM email GROUP BY email HAVING MIN(id); /* MAKE SURE YOU HAVE RIGHT INFO */ SELECT * FROM email WHERE NOT EXISTS (SELECT * FROM email_min WHERE minID=id) /* DELETE FROM EMAIL */ DELETE FROM email WHERE NOT EXISTS (SELECT * FROM email_min WHERE minID=id) /* IF ALL IS WELL, DROP MEMORY TABLE */ DROP TABLE email_min;
記憶體表的好處是使用了一個索引(minID 上的主鍵),它比普通臨時表加快了程序。