Sql-Server

如何確保數據不會插入到數據已經存在的表中?

  • July 20, 2018

我創建了一個查詢,我想將結果插入到另一個表中。它將每天作為作業執行,導入的數據仍將保留在結果集中。如果數據已存在於目標表中,我不希望導入數據。查詢如下:

WITH cteUniquePages
(
CorrelationID,
Title,
URL,
HitDate,
TotalVisitsOnDate,
DontUseThisDate)
AS
(
   SELECT DISTINCT
   CorrelationID,
   Title, 
   URL,
   HitDate,
   COUNT(HitDate) OVER (PARTITION BY URL, HitDate) 'TotalVisitsOnDate',
   CAST(LogTime AS date) 'DontUseThisDate'
   FROM
       (SELECT
       CorrelationId,
       UserLogin,
       LogTime,
       Title,
       CONCAT(
           (CASE 
               WHEN WebUrl <> '' THEN CONCAT(ServerUrl,'/') ELSE ServerUrl END), 
           WebUrl,DocumentPath) 'URL',
       CONCAT(
           CASE
               WHEN (LEN(DATEPART(day,LogTime)))=1 THEN CONCAT('0',DATEPART(day,LogTime)) END,
           CASE
               WHEN (LEN(DATEPART(day,LogTime)))=2 THEN (DATEPART(day,LogTime)) END,'-', 
           CASE 
               WHEN (LEN(DATEPART(month,LogTime)))=1 THEN CONCAT('0',DATEPART(month,LogTime)) END,
           CASE
               WHEN (LEN(DATEPART(month,LogTime)))=2 THEN (DATEPART(month,LogTime)) END,
           '-', DATEPART(year,LogTime)) 'HitDate'
       FROM WSS_Logging.dbo.RequestUsage
       WHERE UserLogin <> 'nt authority\iusr' 
       AND UserLogin <> 'i:0#.w|pfnet\zz_sharepoint13' 
       AND DocumentPath LIKE '%.aspx' 
       AND DocumentPath NOT LIKE '%/_layouts/%'
       AND UserLogin <> 'PFNET\E01BrownS'
       AND UserLogin <> 'i:0#.w|pfnet\sharepointtestacc1'
       GROUP BY UserLogin, WebUrl, DocumentPath, LogTime, Title, ServerUrl,CorrelationId) as a
),
   cteVisitsAllTime
   (
       CorrelationID,
       LogTime,
       Title,
       URL, 
       TotalVisits
   )
   AS
   (
       SELECT DISTINCT
       CorrelationID,
       LogTime, 
       Title,
       URL,
       COUNT(URL) OVER (PARTITION BY URL) 'TotalVisits' 
       FROM(
           SELECT 
           CorrelationId,
           Title,
           CONCAT(
               (CASE 
                   WHEN WebUrl <> '' THEN CONCAT(ServerUrl,'/') ELSE ServerUrl END), 
               WebUrl,DocumentPath) 'URL', 
               LogTime 
           FROM WSS_Logging.dbo.RequestUsage
           WHERE UserLogin <> 'nt authority\iusr' 
           AND UserLogin <> 'i:0#.w|pfnet\zz_sharepoint13' 
           AND DocumentPath LIKE '%.aspx' 
           AND DocumentPath NOT LIKE '%/_layouts/%'
           AND UserLogin <> 'PFNET\E01BrownS'
           AND UserLogin <> 'i:0#.w|pfnet\sharepointtestacc1') as a
   ),
   cteVisitsLast7Days
   (
       CorrelationID,
       Title,
       URL, 
       TotalVisits
   )
   AS
   (
       SELECT
       CorrelationID, 
       Title,
       URL,
       COUNT(URL) OVER (PARTITION BY URL) 'TotalVisits' 
       FROM(
           SELECT 
           CorrelationId,
           Title,
           CONCAT(
               (CASE 
                   WHEN WebUrl <> '' THEN CONCAT(ServerUrl,'/') ELSE ServerUrl END), 
               WebUrl,DocumentPath) 'URL', 
               LogTime 
           FROM WSS_Logging.dbo.RequestUsage
           WHERE UserLogin <> 'nt authority\iusr' 
           AND UserLogin <> 'i:0#.w|pfnet\zz_sharepoint13' 
           AND DocumentPath LIKE '%.aspx' 
           AND DocumentPath NOT LIKE '%/_layouts/%'
           AND UserLogin <> 'PFNET\E01BrownS'
           AND UserLogin <> 'i:0#.w|pfnet\sharepointtestacc1') as a
           WHERE LogTime >= DATEADD(day,-7, GETDATE())
   ),
   cteVisitsLast30Days
   (
       CorrelationID,
       Title,
       URL, 
       TotalVisits
   )
   AS
   (
       SELECT
       CorrelationID, 
       Title,
       URL,
       COUNT(URL) OVER (PARTITION BY URL) 'TotalVisits' 
       FROM(
           SELECT
           CorrelationId, 
           Title,
           CONCAT(
               (CASE 
                   WHEN WebUrl <> '' THEN CONCAT(ServerUrl,'/') ELSE ServerUrl END), 
               WebUrl,DocumentPath) 'URL', 
               LogTime 
           FROM WSS_Logging.dbo.RequestUsage
           WHERE UserLogin <> 'nt authority\iusr' 
           AND UserLogin <> 'i:0#.w|pfnet\zz_sharepoint13' 
           AND DocumentPath LIKE '%.aspx' 
           AND DocumentPath NOT LIKE '%/_layouts/%'
           AND UserLogin <> 'PFNET\E01BrownS'
           AND UserLogin <> 'i:0#.w|pfnet\sharepointtestacc1') as a
           WHERE LogTime >= DATEADD(day,-30, GETDATE())
   ),
   cteVisitsLastYear
   (
       CorrelationID,
       Title,
       URL, 
       TotalVisits
   )
   AS
   (
       SELECT
       CorrelationID, 
       Title,
       URL,
       COUNT(URL) OVER (PARTITION BY URL) 'TotalVisits' 
       FROM(
           SELECT
           CorrelationId, 
           Title,
           CONCAT(
               (CASE 
                   WHEN WebUrl <> '' THEN CONCAT(ServerUrl,'/') ELSE ServerUrl END), 
               WebUrl,DocumentPath) 'URL', 
               LogTime 
           FROM WSS_Logging.dbo.RequestUsage
           WHERE UserLogin <> 'nt authority\iusr' 
           AND UserLogin <> 'i:0#.w|pfnet\zz_sharepoint13' 
           AND DocumentPath LIKE '%.aspx' 
           AND DocumentPath NOT LIKE '%/_layouts/%'
           AND UserLogin <> 'PFNET\E01BrownS'
           AND UserLogin <> 'i:0#.w|pfnet\sharepointtestacc1') as a
           WHERE LogTime >= DATEADD(day,-365, GETDATE())
   )
SELECT DISTINCT 
cteUniquePages.Title,
cteUniquePages.URL,
cteUniquePages.HitDate,
cteUniquePages.TotalVisitsOnDate,
cteVisitsAllTime.TotalVisits 'All Time Visits',
cteVisitsLast7Days.TotalVisits 'Visits Last 7 Days',
cteVisitsLast30Days.TotalVisits 'Visits Last 30 Days',
cteVisitsLastYear.TotalVisits 'Visits Last Year',
cteUniquePages.DontUseThisDate
FROM cteUniquePages
LEFT JOIN cteVisitsAllTime ON cteVisitsAllTime.CorrelationID = cteUniquePages.CorrelationID
LEFT JOIN cteVisitsLast7Days ON cteVisitsLast7Days.CorrelationID = cteUniquePages.CorrelationID
LEFT JOIN cteVisitsLast30Days ON cteVisitsLast30Days.CorrelationID = cteUniquePages.CorrelationID
LEFT JOIN cteVisitsLastYear ON cteVisitsLastYear.CorrelationID = cteUniquePages.CorrelationID
ORDER BY DontUseThisDate DESC, cteUniquePages.URL

有幾個選項:

  1. 在語句中添加一個WHERE NOT EXISTS (SELECT * FROM <target_table> WHERE <matching_predicates> )子句INSERT ... SELECT ...。但請確保您使用真實數據模式進行測試,就像使用相對複雜的模式一樣。
  2. LEFT OUTER JOIN到目標表並添加一個WHERE target_table.primarykey IS NULL子句 - 如果外部連接沒有找到匹配的行,這將是錯誤的。通常這會產生相同的計劃,就WHERE NOT EXISTS好像對於復雜的查詢它可能會有所不同。如果存在性能差異(我已經看到該JOIN變體的性能更好,WHERE NOT EXISTS儘管那是幾個引擎版本之前的版本),請使用產生更好計劃的查詢,否則使用您認為更易於閱讀和維護的查詢。
  3. 使用MERGE而不是 plain INSERT,儘管如果您想更新現有行而不是不複製它們,這是最有用的。

您的範例查詢中沒有插入或表結構,因此我將使用更簡單的版本來解釋它:

CREATE TABLE dbo.Source
(
   ID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
   Title VARCHAR(50),
   URL VARCHAR(50)
);

INSERT INTO dbo.Source(Title, URL) VALUES ('Stack Overflow', 'https://stackoverflow.com');
INSERT INTO dbo.Source(Title, URL) VALUES ('Database Administrators', 'https://dba.stackexchange.com');    

CREATE TABLE dbo.Target
(
   ID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
   Title VARCHAR(50),
   URL VARCHAR(50)
);

/* Here's the data you want to copy from Source to Target: */
INSERT INTO dbo.Target
(
   Title,
   URL
)
SELECT s.Title,
      s.URL
FROM dbo.Source s
   LEFT OUTER JOIN dbo.Target t
       ON s.Title = t.Title
          AND s.URL = t.URL
WHERE t.ID IS NULL;

/* You can run the above repeatedly, and then check to see if there are duplicates: */
SELECT *
FROM dbo.Target;

在我的範例中,我使用了左外連接技術,並且在兩個欄位(標題和 URL)上都進行了連接。我沒有加入 ID 是因為我認為這兩個表都會有所不同。

在您的現實生活場景中,當您複製數據時,請選擇盡可能少的欄位來代表真正獨特的內容 - 例如,您將如何辨識該行是否已經存在。您選擇的欄位越多,數據類型越大,性能就越差。

您的欄位列表是:

cteUniquePages.Title,
cteUniquePages.URL,
cteUniquePages.HitDate,
cteUniquePages.TotalVisitsOnDate,
cteVisitsAllTime.TotalVisits 'All Time Visits',
cteVisitsLast7Days.TotalVisits 'Visits Last 7 Days',
cteVisitsLast30Days.TotalVisits 'Visits Last 30 Days',
cteVisitsLastYear.TotalVisits 'Visits Last Year',
cteUniquePages.DontUseThisDate

我猜只有 Title、URL 和 HitDate 才是使行獨一無二的原因(可能還有 DontUseThisDate)。隨著命中指標隨時間變化,管理層會要求您重新載入數據。因此,您可能想要的是首先刪除該日期範圍記憶體在的行,然後使用此技術。

當您閱讀有關如何執行這種重新載入工作的內容時,您還將遇到 SQL Server 的 MERGE 語句。在你這樣做之前,請閱讀Aaron Bertrand 的 MERGE 陷阱列表。

引用自:https://dba.stackexchange.com/questions/212752