Sql-Server

計算總訪問次數

  • February 24, 2016

我正在嘗試編寫一個查詢,我必須通過處理重疊天數來計算客戶的訪問次數。假設 itemID 2009 的開始日期是 23 日,結束日期是 26 日,因此項目 20010 在這幾天之間,我們不會將此購買日期添加到我們的總數中。

範例場景:

Item ID Start Date   End Date   Number of days     Number of days Candidate for visit count
20009   2015-01-23  2015-01-26     4                      4
20010   2015-01-24  2015-01-24     1                      0
20011   2015-01-23  2015-01-26     4                      0
20012   2015-01-23  2015-01-27     5                      1
20013   2015-01-23  2015-01-27     5                      0
20014   2015-01-29  2015-01-30     2                      2

輸出應為 7 VisitDays

輸入表:

CREATE TABLE #Items    
(
CustID INT,
ItemID INT,
StartDate DATETIME,
EndDate DATETIME
)           


INSERT INTO #Items
SELECT 11205, 20009, '2015-01-23',  '2015-01-26'  
UNION ALL 
SELECT 11205, 20010, '2015-01-24',  '2015-01-24'    
UNION ALL  
SELECT 11205, 20011, '2015-01-23',  '2015-01-26' 
UNION ALL  
SELECT 11205, 20012, '2015-01-23',  '2015-01-27'  
UNION ALL  
SELECT 11205, 20012, '2015-01-23',  '2015-01-27'   
UNION ALL  
SELECT 11205, 20012, '2015-01-28',  '2015-01-29'  

到目前為止我已經嘗試過:

CREATE TABLE #VisitsTable
   (
     StartDate DATETIME,
     EndDate DATETIME
   )

INSERT  INTO #VisitsTable
       SELECT DISTINCT
               StartDate,
               EndDate
       FROM    #Items items
       WHERE   CustID = 11205
       ORDER BY StartDate ASC

IF EXISTS (SELECT TOP 1 1 FROM #VisitsTable) 
BEGIN 


SELECT  ISNULL(SUM(VisitDays),1)
FROM    ( SELECT DISTINCT
                   abc.StartDate,
                   abc.EndDate,
                   DATEDIFF(DD, abc.StartDate, abc.EndDate) + 1 VisitDays
         FROM      #VisitsTable abc
                   INNER JOIN #VisitsTable bc ON bc.StartDate NOT BETWEEN abc.StartDate AND abc.EndDate      
       ) Visits

END



--DROP TABLE #Items 
--DROP TABLE #VisitsTable      

第一個查詢創建不同的開始日期和結束日期範圍,沒有重疊。

筆記:

  • 您的樣本 ( id=0) 與來自 Ypercube ( id=1)的樣本混合
  • 對於每個 id 或大量 id 的大量數據,此解決方案可能無法很好地擴展。這具有不需要數字表的優點。對於大型數據集,數字表很可能會提供更好的性能。

詢問:

SELECT DISTINCT its.id
   , Start_Date = its.Start_Date 
   , End_Date = COALESCE(DATEADD(day, -1, itmax.End_Date), CASE WHEN itmin.Start_Date > its.End_Date THEN itmin.Start_Date ELSE its.End_Date END)
   --, x1=itmax.End_Date, x2=itmin.Start_Date, x3=its.End_Date
FROM @Items its
OUTER APPLY (
   SELECT Start_Date = MAX(End_Date) FROM @Items std
   WHERE std.Item_ID <> its.Item_ID AND std.Start_Date < its.Start_Date AND std.End_Date > its.Start_Date
) itmin
OUTER APPLY (
   SELECT End_Date = MIN(Start_Date) FROM @Items std
   WHERE std.Item_ID <> its.Item_ID+1000 AND std.Start_Date > its.Start_Date AND std.Start_Date < its.End_Date
) itmax;

輸出:

id  | Start_Date                    | End_Date                      
0   | 2015-01-23 00:00:00.0000000   | 2015-01-23 00:00:00.0000000   => 1
0   | 2015-01-24 00:00:00.0000000   | 2015-01-27 00:00:00.0000000   => 4
0   | 2015-01-29 00:00:00.0000000   | 2015-01-30 00:00:00.0000000   => 2
1   | 2016-01-20 00:00:00.0000000   | 2016-01-22 00:00:00.0000000   => 3
1   | 2016-01-23 00:00:00.0000000   | 2016-01-24 00:00:00.0000000   => 2
1   | 2016-01-25 00:00:00.0000000   | 2016-01-29 00:00:00.0000000   => 5

如果您將這些開始日期和結束日期與 DATEDIFF 一起使用:

SELECT DATEDIFF(day
   , its.Start_Date 
   , End_Date = COALESCE(DATEADD(day, -1, itmax.End_Date), CASE WHEN itmin.Start_Date > its.End_Date THEN itmin.Start_Date ELSE its.End_Date END)
) + 1
...

輸出(有重複)是:

  • id 0 的 1、4 和 2(您的範例 => SUM=7
  • id 1 的 3、2 和 5(Ypercube 樣本 => SUM=10

然後,您只需將所有內容與 a SUMand放在一起GROUP BY

SELECT id 
   , Days = SUM(
       DATEDIFF(day, Start_Date, End_Date)+1
   )
FROM (
   SELECT DISTINCT its.id
        , Start_Date = its.Start_Date 
       , End_Date = COALESCE(DATEADD(day, -1, itmax.End_Date), CASE WHEN itmin.Start_Date > its.End_Date THEN itmin.Start_Date ELSE its.End_Date END)
   FROM @Items its
   OUTER APPLY (
       SELECT Start_Date = MAX(End_Date) FROM @Items std
       WHERE std.Item_ID <> its.Item_ID AND std.Start_Date < its.Start_Date AND std.End_Date > its.Start_Date
   ) itmin
   OUTER APPLY (
       SELECT End_Date = MIN(Start_Date) FROM @Items std
       WHERE std.Item_ID <> its.Item_ID AND std.Start_Date > its.Start_Date AND std.Start_Date < its.End_Date
   ) itmax
) as d
GROUP BY id;

輸出:

id  Days
0   7
1   10

使用 2 個不同 ID 的數據:

INSERT INTO @Items
   (id, Item_ID, Start_Date, End_Date)
VALUES 
   (0, 20009, '2015-01-23', '2015-01-26'),
   (0, 20010, '2015-01-24', '2015-01-24'),
   (0, 20011, '2015-01-23', '2015-01-26'),
   (0, 20012, '2015-01-23', '2015-01-27'),
   (0, 20013, '2015-01-23', '2015-01-27'),
   (0, 20014, '2015-01-29', '2015-01-30'),

   (1, 20009, '2016-01-20', '2016-01-24'),
   (1, 20010, '2016-01-23', '2016-01-26'),
   (1, 20011, '2016-01-25', '2016-01-29')

引用自:https://dba.stackexchange.com/questions/130141