Sql-Server
在有序集中對具有空值的行的子集進行分組
假設我們有一個表,其中每一行都是一天,並且按這一天列排序。然後我們加入了一個會員數據集,顯示會員在哪一天活躍(和不活躍)。
假設我們目前的數據集看起來像這樣……成員資格從第 3-5 天開始活躍,從 5-8 天開始不活躍,從第 9 天開始活躍等等。
DAY DATE MEMBER ACTIVE 1 2017-01-01 123 null 2 2017-01-02 123 null 3 2017-01-03 123 2017-01-03 4 2017-01-04 123 2017-01-04 5 2017-01-05 123 2017-01-05 6 2017-01-06 123 null 7 2017-01-07 123 null 8 2017-01-08 123 null 9 2017-01-09 123 2017-01-09 10 2017-01-10 123 2017-01-10
…所以
ACTIVE=null
意味著會員在那些日子裡不活躍。使用此資料結構,我想獲得一個“折疊”集,顯示非活動/活動時間的“跨度”:
MEMBER MIN(DATE) MAX(DATE) STATUS 123, 2017-01-01, 2017-01-02 INACTIVE 123, 2017-01-03, 2017-01-05 ACTIVE 123, 2017-01-06, 2017-01-08 INACTIVE 123, 2017-01-09, 2017-01-10 ACTIVE
我嘗試使用 row_number() 以某種方式劃分出某個狀態的子集,但在這種情況下,在 ACTIVE 為空的行上使用
min()
/max()
將它們視為一個組,而實際上,有幾個不同的跨度“非活躍會員”。為了分組目的,我如何區分非活動成員的跨度?我可以使用什麼技術來實現上述輸出?
這是生成虛擬源數據的腳本:
CREATE TABLE ##SRC (ID INT, D DATE, MEMBER INT, ACTIVE DATE); INSERT INTO ##SRC (ID, D, MEMBER, ACTIVE) SELECT 1, '2017-01-01', 123, NULL UNION SELECT 2, '2017-01-02', 123, NULL UNION SELECT 3, '2017-01-03', 123, '2017-01-03' UNION SELECT 4, '2017-01-04', 123, '2017-01-04' UNION SELECT 5, '2017-01-05', 123, '2017-01-05' UNION SELECT 6, '2017-01-06', 123, NULL UNION SELECT 7, '2017-01-07', 123, NULL UNION SELECT 8, '2017-01-08', 123, NULL UNION SELECT 9, '2017-01-09', 123, '2017-01-09' UNION SELECT 10, '2017-01-10', 123, '2017-01-10' ;
您的範例數據與您的描述不符,起初讓我感到困惑。正如 sp_BlitzErik 指出的那樣,這是一個孤島和間隙問題。如果您可以訪問視窗函式,則解決方案非常簡單。首先,我們可以單獨列舉每個成員的表,我們稱之為 full_order(這恰好與 day 相同,但為了一般性,我會添加它)。其次,我們可以列舉每個成員的表以及他們當天是否處於活動狀態,我們稱之為 partial_order
select day, active, date, member , row_number() over (partition by member order by day) as fullorder , row_number() over (partition by member ,case when active is null then 0 else 1 end order by day) as partialorder from src DAY ACTIVE MEMBER FULLORDER PARTIALORDER ----------- ---------- ----------- -------------------- -------------------- 1 - 123 1 1 2 - 123 2 2 3 01/03/2017 123 3 1 4 01/04/2017 123 4 2 5 01/05/2017 123 5 3 6 - 123 6 3 7 - 123 7 4 8 - 123 8 5 9 01/09/2017 123 9 4 10 01/10/2017 123 10 5
如果full_order和partial_order之間的差異發生變化,則意味著active已經從null變為值,反之亦然。因此,我們可以用這種差異組成一個組。在每個這樣的組中,我們可以選擇 min(active) 和 max(active) 來形成一個區間:
select member, grp, min(date), max(active) from ( select day, active, date, member , row_number() over (partition by member order by day) - row_number() over (partition by member ,case when active is null then 0 else 1 end order by day) as grp from src ) group by member, grp MEMBER GRP 3 4 ----------- -------------------- ---------- ---------- 123 0 01/01/2017 - 123 2 01/03/2017 01/05/2017 123 3 01/05/2017 - 123 5 01/08/2017 01/10/2017
添加另一層嵌套以獲得所需的結果可能是最簡單的:
select member, min_active , coalesce(max_active, min_active) as max_active , case when max_active is null then 'INACTIVE' else 'ACTIVE' end as status from ( select member, grp, min(date) as min_active, max(active) as max_active from ( select day, active, date, member , row_number() over (partition by member order by day) - row_number() over (partition by member ,case when active is null then 0 else 1 end order by day) as grp from src ) group by member, grp) MEMBER MIN_ACTIVE MAX_ACTIVE STATUS ----------- ---------- ---------- -------- 123 01/01/2017 01/01/2017 INACTIVE 123 01/03/2017 01/05/2017 ACTIVE 123 01/05/2017 01/05/2017 INACTIVE 123 01/08/2017 01/10/2017 ACTIVE