跨一對多關係表查找重複記錄

April 17, 2018

假設我有兩個具有一對多關係的表：人員和屬性。我正在嘗試根據 first_name、last_name 查找重複項，並且所有屬性必須完全匹配。

CREATE TABLE People (Id int, 
                first_name varchar(100), 
                last_name varchar(100));

CREATE TABLE Attributes (Id int, 
                    person_id int, 
                    field varchar(100),
                    field_value varchar(100));

INSERT INTO People VALUES (1, 'John', 'Smith');
INSERT INTO People VALUES (2, 'John', 'Smith');
INSERT INTO People VALUES (3, 'John', 'Smith');

INSERT INTO Attributes VALUES (1, 1, 'HairColor', 'Brown');
INSERT INTO Attributes VALUES (2, 1, 'EyeColor', 'Blue');
INSERT INTO Attributes VALUES (3, 2, 'HairColor', 'Brown');
INSERT INTO Attributes VALUES (4, 2, 'EyeColor', 'Blue');
INSERT INTO Attributes VALUES (5, 3, 'HairColor', 'Blonde');

這給了我們：

id | first_name | last_name
----+------------+-----------
 1 | John       | Smith
 2 | John       | Smith
 3 | John       | Smith

id | person_id |   field   | field_value
----+-----------+-----------+-------------
 1 |         1 | HairColor | Brown
 2 |         1 | EyeColor  | Blue
 3 |         2 | HairColor | Brown
 4 |         2 | EyeColor  | Blue
 5 |         3 | HairColor | Blonde

我想要一個從 People 表中返回 ID 1 和 2 的查詢。我可以在一個表中找到重複項：

select first_name,last_name,count(*) from People 
   group by first_name,last_name having ( count(*) &gt; 1 );

但是我無法加入一對多表並檢測兩個表中的重複項。如何檢測具有一對多關係的表中的重複項？

一種方法（檢查SQLfiddle）：

select 
   p1.id as id1, 
   p2.id as id2
from people p1
 join people p2
   on  p1.first_name = p2.first_name
   and p1.last_name = p2.last_name
   and p1.id &lt; p2.id
where not exists
   ( select 1
     from 
     ( select * 
       from attributes a1
       where a1.person_id = p1.id
     union all
       select * 
       from attributes a2
       where a2.person_id = p2.id
     ) g
     group by field, field_value
     having count(*) &lt;&gt; 2
  ) ;

另一個：

select 
   p1.id as id1, 
   p2.id as id2
from people p1
 join people p2
   on  p1.first_name = p2.first_name
   and p1.last_name = p2.last_name
   and p1.id &lt; p2.id
where not exists
   ( ( select field, field_value
       from attributes a1
       where a1.person_id = p1.id
     union 
       select field, field_value
       from attributes a2
      where a2.person_id = p2.id
     ) 
   except
     ( select field, field_value
       from attributes a1
       where a1.person_id = p1.id
     intersect
       select field, field_value
       from attributes a2
       where a2.person_id = p2.id
     )
   ) ;

Intersect 的優先級高於 except/minus，至少在 Postgres 和 SQL Server 中是這樣。為了安全起見，您可以使用括號來確保優先級。

引用自：https://dba.stackexchange.com/questions/91959

跨一對多關係表查找重複記錄

相關問答

Mysql：創建一個具有多個自連接的視圖，結果中沒有重複

比較重複項並在連接中選擇具有最近日期的重複項

使用連接時忽略重複項

通過引用多個表的連接條件的子查詢刪除重複結果

多個左連接獲取重複數據

合併行，然後從另一個表更新 fk 以匹配新值