Teradata
計算期間開始和結束日期
我的桌子:
Date Employee Status ----------------------------- 20171106 001 At work 20171107 001 Sick leave 20171108 001 At work 20171109 001 At work 20171111 001 Sick leave (A gap here) 20171112 001 Sick leave 20171115 001 At work (Another gap) 20171116 001 At work
期望的結果:
Employee Status StartDT EndDT ------------------------------------------------------------------- 001 At work Some time in the history 20171106 001 Sick leave 20171107 20171107 001 At work 20171108 20171109 001 Sick leave 20171111 20171112 001 At work 20171115 20171116
邏輯:我們按狀態重新組織源表,而不是按日期。所以日期的差距並不重要,應該被忽略。
如何在 Teradata 15 中執行此操作?
注意:
select min(Date), max(Date) group by employee, status;
將不起作用,因為兩個“工作中”之間的狀態可能會發生變化。
最簡單的解決方案是標準化一個週期:
SELECT NORMALIZE Employee, Status, PERIOD(date,date+1) AS pd FROM mytable ORDER BY Employee, pd
NORMALIZE
是一種非常未知的語法,它結合了重疊的時段,您只需要在日期列之外創建一個一天的時段。由於這會導致您在一段時間內獲得稍微不同的輸出,因此結束日期與預期結果相比是 +1。要解決此問題,您可以將期間拆分回單獨的列:
SELECT Employee, Status, Begin(pd), Last(pd) -- last included date, i.e. expected EndDT FROM ( SELECT NORMALIZE Employee, Status, PERIOD(date,date+1) AS pd FROM myTable ) AS dt ORDER BY 1,pd
得到這個歷史上的某些時候更複雜,需要額外的計算,你應該檢查你是否真的需要它。
另一個更經典的解決方案計算具有連續值的行組:
SELECT Employee, Status, Min(date), Max(date) FROM ( SELECT Employee, Status, date, -- this is the tricky part, the differnce between a monotonous sequence (row_number) -- and another monotonous sequence with gaps (date) -- is constant when there's no gap date - Row_Number() Over (PARTITION BY Employee, Status ORDER BY date) AS grp FROM vt ) AS dt GROUP BY Employee, Status, grp ORDER BY 3
當您添加時,這兩種解決方案都會導致額外的一行
20171120 001 At work ... 001 At work 20171115 20171116 001 At work 20171120 20171120
如果要與上一行合併
001 At work 20171115 20171120
它也有點複雜……