Sql-Server
如何確保數據不會插入到數據已經存在的表中?
我創建了一個查詢,我想將結果插入到另一個表中。它將每天作為作業執行,導入的數據仍將保留在結果集中。如果數據已存在於目標表中,我不希望導入數據。查詢如下:
WITH cteUniquePages ( CorrelationID, Title, URL, HitDate, TotalVisitsOnDate, DontUseThisDate) AS ( SELECT DISTINCT CorrelationID, Title, URL, HitDate, COUNT(HitDate) OVER (PARTITION BY URL, HitDate) 'TotalVisitsOnDate', CAST(LogTime AS date) 'DontUseThisDate' FROM (SELECT CorrelationId, UserLogin, LogTime, Title, CONCAT( (CASE WHEN WebUrl <> '' THEN CONCAT(ServerUrl,'/') ELSE ServerUrl END), WebUrl,DocumentPath) 'URL', CONCAT( CASE WHEN (LEN(DATEPART(day,LogTime)))=1 THEN CONCAT('0',DATEPART(day,LogTime)) END, CASE WHEN (LEN(DATEPART(day,LogTime)))=2 THEN (DATEPART(day,LogTime)) END,'-', CASE WHEN (LEN(DATEPART(month,LogTime)))=1 THEN CONCAT('0',DATEPART(month,LogTime)) END, CASE WHEN (LEN(DATEPART(month,LogTime)))=2 THEN (DATEPART(month,LogTime)) END, '-', DATEPART(year,LogTime)) 'HitDate' FROM WSS_Logging.dbo.RequestUsage WHERE UserLogin <> 'nt authority\iusr' AND UserLogin <> 'i:0#.w|pfnet\zz_sharepoint13' AND DocumentPath LIKE '%.aspx' AND DocumentPath NOT LIKE '%/_layouts/%' AND UserLogin <> 'PFNET\E01BrownS' AND UserLogin <> 'i:0#.w|pfnet\sharepointtestacc1' GROUP BY UserLogin, WebUrl, DocumentPath, LogTime, Title, ServerUrl,CorrelationId) as a ), cteVisitsAllTime ( CorrelationID, LogTime, Title, URL, TotalVisits ) AS ( SELECT DISTINCT CorrelationID, LogTime, Title, URL, COUNT(URL) OVER (PARTITION BY URL) 'TotalVisits' FROM( SELECT CorrelationId, Title, CONCAT( (CASE WHEN WebUrl <> '' THEN CONCAT(ServerUrl,'/') ELSE ServerUrl END), WebUrl,DocumentPath) 'URL', LogTime FROM WSS_Logging.dbo.RequestUsage WHERE UserLogin <> 'nt authority\iusr' AND UserLogin <> 'i:0#.w|pfnet\zz_sharepoint13' AND DocumentPath LIKE '%.aspx' AND DocumentPath NOT LIKE '%/_layouts/%' AND UserLogin <> 'PFNET\E01BrownS' AND UserLogin <> 'i:0#.w|pfnet\sharepointtestacc1') as a ), cteVisitsLast7Days ( CorrelationID, Title, URL, TotalVisits ) AS ( SELECT CorrelationID, Title, URL, COUNT(URL) OVER (PARTITION BY URL) 'TotalVisits' FROM( SELECT CorrelationId, Title, CONCAT( (CASE WHEN WebUrl <> '' THEN CONCAT(ServerUrl,'/') ELSE ServerUrl END), WebUrl,DocumentPath) 'URL', LogTime FROM WSS_Logging.dbo.RequestUsage WHERE UserLogin <> 'nt authority\iusr' AND UserLogin <> 'i:0#.w|pfnet\zz_sharepoint13' AND DocumentPath LIKE '%.aspx' AND DocumentPath NOT LIKE '%/_layouts/%' AND UserLogin <> 'PFNET\E01BrownS' AND UserLogin <> 'i:0#.w|pfnet\sharepointtestacc1') as a WHERE LogTime >= DATEADD(day,-7, GETDATE()) ), cteVisitsLast30Days ( CorrelationID, Title, URL, TotalVisits ) AS ( SELECT CorrelationID, Title, URL, COUNT(URL) OVER (PARTITION BY URL) 'TotalVisits' FROM( SELECT CorrelationId, Title, CONCAT( (CASE WHEN WebUrl <> '' THEN CONCAT(ServerUrl,'/') ELSE ServerUrl END), WebUrl,DocumentPath) 'URL', LogTime FROM WSS_Logging.dbo.RequestUsage WHERE UserLogin <> 'nt authority\iusr' AND UserLogin <> 'i:0#.w|pfnet\zz_sharepoint13' AND DocumentPath LIKE '%.aspx' AND DocumentPath NOT LIKE '%/_layouts/%' AND UserLogin <> 'PFNET\E01BrownS' AND UserLogin <> 'i:0#.w|pfnet\sharepointtestacc1') as a WHERE LogTime >= DATEADD(day,-30, GETDATE()) ), cteVisitsLastYear ( CorrelationID, Title, URL, TotalVisits ) AS ( SELECT CorrelationID, Title, URL, COUNT(URL) OVER (PARTITION BY URL) 'TotalVisits' FROM( SELECT CorrelationId, Title, CONCAT( (CASE WHEN WebUrl <> '' THEN CONCAT(ServerUrl,'/') ELSE ServerUrl END), WebUrl,DocumentPath) 'URL', LogTime FROM WSS_Logging.dbo.RequestUsage WHERE UserLogin <> 'nt authority\iusr' AND UserLogin <> 'i:0#.w|pfnet\zz_sharepoint13' AND DocumentPath LIKE '%.aspx' AND DocumentPath NOT LIKE '%/_layouts/%' AND UserLogin <> 'PFNET\E01BrownS' AND UserLogin <> 'i:0#.w|pfnet\sharepointtestacc1') as a WHERE LogTime >= DATEADD(day,-365, GETDATE()) ) SELECT DISTINCT cteUniquePages.Title, cteUniquePages.URL, cteUniquePages.HitDate, cteUniquePages.TotalVisitsOnDate, cteVisitsAllTime.TotalVisits 'All Time Visits', cteVisitsLast7Days.TotalVisits 'Visits Last 7 Days', cteVisitsLast30Days.TotalVisits 'Visits Last 30 Days', cteVisitsLastYear.TotalVisits 'Visits Last Year', cteUniquePages.DontUseThisDate FROM cteUniquePages LEFT JOIN cteVisitsAllTime ON cteVisitsAllTime.CorrelationID = cteUniquePages.CorrelationID LEFT JOIN cteVisitsLast7Days ON cteVisitsLast7Days.CorrelationID = cteUniquePages.CorrelationID LEFT JOIN cteVisitsLast30Days ON cteVisitsLast30Days.CorrelationID = cteUniquePages.CorrelationID LEFT JOIN cteVisitsLastYear ON cteVisitsLastYear.CorrelationID = cteUniquePages.CorrelationID ORDER BY DontUseThisDate DESC, cteUniquePages.URL
有幾個選項:
- 在語句中添加一個
WHERE NOT EXISTS (SELECT * FROM <target_table> WHERE <matching_predicates> )
子句INSERT ... SELECT ...
。但請確保您使用真實數據模式進行測試,就像使用相對複雜的模式一樣。LEFT OUTER JOIN
到目標表並添加一個WHERE target_table.primarykey IS NULL
子句 - 如果外部連接沒有找到匹配的行,這將是錯誤的。通常這會產生相同的計劃,就WHERE NOT EXISTS
好像對於復雜的查詢它可能會有所不同。如果存在性能差異(我已經看到該JOIN
變體的性能更好,WHERE NOT EXISTS
儘管那是幾個引擎版本之前的版本),請使用產生更好計劃的查詢,否則使用您認為更易於閱讀和維護的查詢。- 使用
MERGE
而不是 plainINSERT
,儘管如果您想更新現有行而不是不複製它們,這是最有用的。
您的範例查詢中沒有插入或表結構,因此我將使用更簡單的版本來解釋它:
CREATE TABLE dbo.Source ( ID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED, Title VARCHAR(50), URL VARCHAR(50) ); INSERT INTO dbo.Source(Title, URL) VALUES ('Stack Overflow', 'https://stackoverflow.com'); INSERT INTO dbo.Source(Title, URL) VALUES ('Database Administrators', 'https://dba.stackexchange.com'); CREATE TABLE dbo.Target ( ID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED, Title VARCHAR(50), URL VARCHAR(50) ); /* Here's the data you want to copy from Source to Target: */ INSERT INTO dbo.Target ( Title, URL ) SELECT s.Title, s.URL FROM dbo.Source s LEFT OUTER JOIN dbo.Target t ON s.Title = t.Title AND s.URL = t.URL WHERE t.ID IS NULL; /* You can run the above repeatedly, and then check to see if there are duplicates: */ SELECT * FROM dbo.Target;
在我的範例中,我使用了左外連接技術,並且在兩個欄位(標題和 URL)上都進行了連接。我沒有加入 ID 是因為我認為這兩個表都會有所不同。
在您的現實生活場景中,當您複製數據時,請選擇盡可能少的欄位來代表真正獨特的內容 - 例如,您將如何辨識該行是否已經存在。您選擇的欄位越多,數據類型越大,性能就越差。
您的欄位列表是:
cteUniquePages.Title, cteUniquePages.URL, cteUniquePages.HitDate, cteUniquePages.TotalVisitsOnDate, cteVisitsAllTime.TotalVisits 'All Time Visits', cteVisitsLast7Days.TotalVisits 'Visits Last 7 Days', cteVisitsLast30Days.TotalVisits 'Visits Last 30 Days', cteVisitsLastYear.TotalVisits 'Visits Last Year', cteUniquePages.DontUseThisDate
我猜只有 Title、URL 和 HitDate 才是使行獨一無二的原因(可能還有 DontUseThisDate)。隨著命中指標隨時間變化,管理層會要求您重新載入數據。因此,您可能想要的是首先刪除該日期範圍記憶體在的行,然後使用此技術。
當您閱讀有關如何執行這種重新載入工作的內容時,您還將遇到 SQL Server 的 MERGE 語句。在你這樣做之前,請閱讀Aaron Bertrand 的 MERGE 陷阱列表。