Sql-Server

查找具有相同字元串和額外字元的記錄

  • November 21, 2018

好的,所以我有一個 Microsoft SQL Server 2014 數據庫表owner,其中包含大約 90,000 條包含所有者資訊的記錄,另一個vehicle包含車輛資訊的記錄

Owner_Name                   owner_id       V_name     owner_id    exempt
-------------------------------------       ------------------------------
JACOB JAMISON & JESSICA           35        Civic            35        H3
JACOB JAMISON M & JESSICA B       39        Accord           39        H3 
BLACKSON BARRINGTON               56        Bugatti          56        H6
BLACKSON BARRINGTON H             98        SSC              98        H7
BRUSTER MICHAEL                   107       Corvette         107       H9

我正在嘗試查找在車輛上有多個豁免的所有記錄(H0意味著沒有豁免)。下面的程式碼執行良好,只要名稱完全相同。但是,如果有變體,例如額外的字母或輸入的倒序,則不會返回這些記錄。我看過類似的東西SOUNDEX,但這在我的場景中不起作用。

SELECT Owner_name
    , COUNT(Owner_name) AS 'xNameAppears'
    , COUNT(v.exempt) AS 'ExemptionCount' 
FROM owner o
INNER JOIN vehicle V ON V.owner_id = o.owner_id
WHERE v.exempt <> 'H0'
GROUP BY O.owner_name
HAVING COUNT(v.exempt) > 1

有沒有一種解決方案可以讓我返回這樣的記錄,不知道哪些owner_name可能相似?基本上試圖讓伺服器搜尋該owner_name列,如果有相似之處 JACOB JAMISON & JESSICAJACOB JAMISON M & JESSICA B那麼它將返回這些記錄,如下所示:

Owner_Name                      xNameAppears      ExemptCount
-------------------------------------------------------------      
JACOB JAMISON & JESSICA           2                         2
JACOB JAMISON M & JESSICA B       2                         2
BLACKSON BARRINGTON               2                         2
BLACKSON BARRINGTON H             2                         2

先感謝您!

SOUNDEX函式也可以應用於列。

但由於

有成千上萬個這樣的

我不建議只寫一個查詢來加入一個函式來做到這一點。

這可能在較大的表上表現不佳:

SELECT *
FROM dbo.vehicle AS v
JOIN dbo.vehicle AS v2
ON SOUNDEX(v2.Owner_Name) = SOUNDEX(v.Owner_Name)
AND v2.Owner_Name <> v.Owner_Name;

我寧願做一些事情,從長遠來看,這會更容易找到。

這是一個例子:

CREATE TABLE dbo.vehicle (Owner_Name VARCHAR(50));
INSERT dbo.vehicle ( Owner_Name )
SELECT *
FROM (  
VALUES            
('JACOB JAMISON & JESSICA'),
('JACOB JAMISON M & JESSICA B'),
('BLACKSON BARRINGTON'),          
('BLACKSON BARRINGTON H'),        
('BRUSTER MICHAEL')
) AS x (Owner_Name);

我將添加一個基於函式的計算列,然後添加一個索引來幫助我的查詢。

ALTER TABLE dbo.vehicle ADD Owner_Soundex AS SOUNDEX(Owner_Name);

CREATE INDEX ix_whatever ON dbo.vehicle (Owner_Soundex, Owner_Name);

驗證一切看起來都不錯…

SELECT *
FROM dbo.vehicle AS v

使用這樣的查詢來查找不精確的匹配:

SELECT *
FROM dbo.vehicle AS v
JOIN dbo.vehicle AS v2
ON v2.Owner_Soundex = v.Owner_Soundex
AND v2.Owner_Name <> v.Owner_Name;

引用自:https://dba.stackexchange.com/questions/223108