Join

跨一對多關係表查找重複記錄

  • April 17, 2018

假設我有兩個具有一對多關係的表:人員和屬性。我正在嘗試根據 first_name、last_name 查找重複項,並且所有屬性必須完全匹配。

CREATE TABLE People (Id int, 
                first_name varchar(100), 
                last_name varchar(100));

CREATE TABLE Attributes (Id int, 
                    person_id int, 
                    field varchar(100),
                    field_value varchar(100));

INSERT INTO People VALUES (1, 'John', 'Smith');
INSERT INTO People VALUES (2, 'John', 'Smith');
INSERT INTO People VALUES (3, 'John', 'Smith');

INSERT INTO Attributes VALUES (1, 1, 'HairColor', 'Brown');
INSERT INTO Attributes VALUES (2, 1, 'EyeColor', 'Blue');
INSERT INTO Attributes VALUES (3, 2, 'HairColor', 'Brown');
INSERT INTO Attributes VALUES (4, 2, 'EyeColor', 'Blue');
INSERT INTO Attributes VALUES (5, 3, 'HairColor', 'Blonde');

這給了我們:

id | first_name | last_name
----+------------+-----------
 1 | John       | Smith
 2 | John       | Smith
 3 | John       | Smith

id | person_id |   field   | field_value
----+-----------+-----------+-------------
 1 |         1 | HairColor | Brown
 2 |         1 | EyeColor  | Blue
 3 |         2 | HairColor | Brown
 4 |         2 | EyeColor  | Blue
 5 |         3 | HairColor | Blonde

我想要一個從 People 表中返回 ID 1 和 2 的查詢。我可以在一個表中找到重複項:

select first_name,last_name,count(*) from People 
   group by first_name,last_name having ( count(*) > 1 );

但是我無法加入一對多表並檢測兩個表中的重複項。如何檢測具有一對多關係的表中的重複項?

一種方法(檢查SQLfiddle):

select 
   p1.id as id1, 
   p2.id as id2
from people p1
 join people p2
   on  p1.first_name = p2.first_name
   and p1.last_name = p2.last_name
   and p1.id < p2.id
where not exists
   ( select 1
     from 
     ( select * 
       from attributes a1
       where a1.person_id = p1.id
     union all
       select * 
       from attributes a2
       where a2.person_id = p2.id
     ) g
     group by field, field_value
     having count(*) <> 2
  ) ;

另一個:

select 
   p1.id as id1, 
   p2.id as id2
from people p1
 join people p2
   on  p1.first_name = p2.first_name
   and p1.last_name = p2.last_name
   and p1.id < p2.id
where not exists
   ( ( select field, field_value
       from attributes a1
       where a1.person_id = p1.id
     union 
       select field, field_value
       from attributes a2
      where a2.person_id = p2.id
     ) 
   except
     ( select field, field_value
       from attributes a1
       where a1.person_id = p1.id
     intersect
       select field, field_value
       from attributes a2
       where a2.person_id = p2.id
     )
   ) ;

Intersect 的優先級高於 except/minus,至少在 Postgres 和 SQL Server 中是這樣。為了安全起見,您可以使用括號來確保優先級。

引用自:https://dba.stackexchange.com/questions/91959