Sql-Server
根據之前的月末值填充缺失數據
給定以下數據:
create table #histories ( username varchar(10), account varchar(10), assigned date ); insert into #histories values ('PHIL','ACCOUNT1','2017-01-04'), ('PETER','ACCOUNT1','2017-01-15'), ('DAVE','ACCOUNT1','2017-03-04'), ('ANDY','ACCOUNT1','2017-05-06'), ('DAVE','ACCOUNT1','2017-05-07'), ('FRED','ACCOUNT1','2017-05-08'), ('JAMES','ACCOUNT1','2017-08-05'), ('DAVE','ACCOUNT2','2017-01-02'), ('PHIL','ACCOUNT2','2017-01-18'), ('JOSH','ACCOUNT2','2017-04-08'), ('JAMES','ACCOUNT2','2017-04-09'), ('DAVE','ACCOUNT2','2017-05-06'), ('PHIL','ACCOUNT2','2017-05-07') ;
…表示將給定使用者分配給帳戶的時間。
我正在尋找在每個月的最後一天確定誰擁有給定帳戶(指定日期是帳戶轉移所有權的日期),並填充任何缺少的月末
dates
(可能是從我可用的方便表創建的,具有有用的列DateKey
,Date
和LastDayOfMonth
,$$ courtesy of @AaronBertrand $$) 1 . 期望的結果是:
PETER, ACCOUNT1, 2017-01-31 PETER, ACCOUNT1, 2017-02-28 DAVE, ACCOUNT1, 2017-03-31 DAVE, ACCOUNT1, 2017-04-30 FRED, ACCOUNT1, 2017-05-31 FRED, ACCOUNT1, 2017-06-30 FRED, ACCOUNT1, 2017-07-31 JAMES, ACCOUNT1, 2017-08-31 PHIL, ACCOUNT2, 2017-01-31 PHIL, ACCOUNT2, 2017-02-28 PHIL, ACCOUNT2, 2017-03-31 JAMES, ACCOUNT2, 2017-04-30 PHIL, ACCOUNT2, 2017-05-31
使用視窗函式執行此操作的初始部分是微不足道的,它添加了我正在努力解決的“缺失”行。
解決此問題的一種方法是執行以下操作:
LEAD
在 SQL Server 2008 上進行仿真。您可以APPLY
為此使用或 suquery。- 對於沒有下一行的行,將其帳戶日期添加一個月。
- 加入包含月末日期的維度表。這將消除所有不跨越至少一個月的行,並根據需要添加行以填補空白。
我稍微修改了您的測試數據以使結果具有確定性。還添加了一個索引:
create table #histories ( username varchar(10), account varchar(10), assigned date ); insert into #histories values ('PHIL','ACCOUNT1','2017-01-04'), ('PETER','ACCOUNT1','2017-01-15'), ('DAVE','ACCOUNT1','2017-03-04'), ('ANDY','ACCOUNT1','2017-05-06'), ('DAVE','ACCOUNT1','2017-05-07'), ('FRED','ACCOUNT1','2017-05-08'), ('JAMES','ACCOUNT1','2017-08-05'), ('DAVE','ACCOUNT2','2017-01-02'), ('PHIL','ACCOUNT2','2017-01-18'), ('JOSH','ACCOUNT2','2017-04-08'), -- changed this date to have deterministic results ('JAMES','ACCOUNT2','2017-04-09'), ('DAVE','ACCOUNT2','2017-05-06'), ('PHIL','ACCOUNT2','2017-05-07') ; -- make life easy create index gotta_go_fast ON #histories (account, assigned);
這是有史以來最懶惰的日期維度表:
create table #date_dim_months_only ( month_date date, primary key (month_date) ); -- put 2500 month ends into table INSERT INTO #date_dim_months_only WITH (TABLOCK) SELECT DATEADD(DAY, -1, DATEADD(MONTH, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), '20000101')) FROM master..spt_values;
對於第 1 步,有很多方法可以模擬
LEAD
. 這是一種方法:SELECT h1.username , h1.account , h1.assigned , next_date.assigned FROM #histories h1 OUTER APPLY ( SELECT TOP 1 h2.assigned FROM #histories h2 WHERE h1.account = h2.account AND h1.assigned < h2.assigned ORDER BY h2.assigned ASC ) next_date;
對於第 2 步,我們需要將 NULL 值更改為其他值。您希望包括每個帳戶的最後一個月,因此在開始日期上添加一個月就足夠了:
ISNULL(next_date.assigned, DATEADD(MONTH, 1, h1.assigned))
對於第 3 步,我們可以加入日期維度表。維度表中的列正是結果集所需的列:
INNER JOIN #date_dim_months_only dd ON dd.month_date >= h1.assigned AND dd.month_date < ISNULL(next_date.assigned, DATEADD(MONTH, 1, h1.assigned))
當我把它們放在一起時,我不喜歡我得到的查詢。
OUTER APPLY
組合和時,連接順序可能會出現問題INNER JOIN
。為了得到我想要的連接順序,我用一個子查詢重寫了它:SELECT hist.username , hist.account , dd.month_date FROM ( SELECT h1.username , h1.account , h1.assigned , ISNULL( (SELECT TOP 1 h2.assigned FROM #histories h2 WHERE h1.account = h2.account AND h1.assigned < h2.assigned ORDER BY h2.assigned ASC ) , DATEADD(MONTH, 1, h1.assigned) ) next_assigned FROM #histories h1 ) hist INNER JOIN #date_dim_months_only dd ON dd.month_date >= hist.assigned AND dd.month_date < hist.next_assigned;
我不知道你有多少數據,所以對你來說可能無關緊要。但是該計劃看起來像我想要的那樣:
結果與您的相符:
╔══════════╦══════════╦════════════╗ ║ username ║ account ║ month_date ║ ╠══════════╬══════════╬════════════╣ ║ PETER ║ ACCOUNT1 ║ 2017-01-31 ║ ║ PETER ║ ACCOUNT1 ║ 2017-02-28 ║ ║ DAVE ║ ACCOUNT1 ║ 2017-03-31 ║ ║ DAVE ║ ACCOUNT1 ║ 2017-04-30 ║ ║ FRED ║ ACCOUNT1 ║ 2017-05-31 ║ ║ FRED ║ ACCOUNT1 ║ 2017-06-30 ║ ║ FRED ║ ACCOUNT1 ║ 2017-07-31 ║ ║ JAMES ║ ACCOUNT1 ║ 2017-08-31 ║ ║ PHIL ║ ACCOUNT2 ║ 2017-01-31 ║ ║ PHIL ║ ACCOUNT2 ║ 2017-02-28 ║ ║ PHIL ║ ACCOUNT2 ║ 2017-03-31 ║ ║ JAMES ║ ACCOUNT2 ║ 2017-04-30 ║ ║ PHIL ║ ACCOUNT2 ║ 2017-05-31 ║ ╚══════════╩══════════╩════════════╝