Sql-Server

在有序集中對具有空值的行的子集進行分組

  • November 22, 2017

假設我們有一個表,其中每一行都是一天,並且按這一天列排序。然後我們加入了一個會員數據集,顯示會員在哪一天活躍(和不活躍)。

假設我們目前的數據集看起來像這樣……成員資格從第 3-5 天開始活躍,從 5-8 天開始不活躍,從第 9 天開始活躍等等。

DAY     DATE        MEMBER  ACTIVE
1      2017-01-01  123     null
2      2017-01-02  123     null
3      2017-01-03  123     2017-01-03
4      2017-01-04  123     2017-01-04
5      2017-01-05  123     2017-01-05
6      2017-01-06  123     null
7      2017-01-07  123     null
8      2017-01-08  123     null
9      2017-01-09  123     2017-01-09
10      2017-01-10  123     2017-01-10

…所以ACTIVE=null意味著會員在那些日子裡不活躍。

使用此資料結構,我想獲得一個“折疊”集,顯示非活動/活動時間的“跨度”:

MEMBER  MIN(DATE)   MAX(DATE)   STATUS
123,    2017-01-01, 2017-01-02  INACTIVE
123,    2017-01-03, 2017-01-05  ACTIVE
123,    2017-01-06, 2017-01-08  INACTIVE
123,    2017-01-09, 2017-01-10  ACTIVE

我嘗試使用 row_number() 以某種方式劃分出某個狀態的子集,但在這種情況下,在 ACTIVE 為空的行上使用min()/max()將它們視為一個組,而實際上,有幾個不同的跨度“非活躍會員”。

為了分組目的,我如何區分非活動成員的跨度?我可以使用什麼技術來實現上述輸出?

這是生成虛擬源數據的腳本:

CREATE TABLE ##SRC (ID INT, D DATE, MEMBER INT, ACTIVE DATE);

INSERT INTO ##SRC (ID, D, MEMBER, ACTIVE)
SELECT 1, '2017-01-01', 123, NULL UNION 
SELECT 2, '2017-01-02', 123, NULL UNION 
SELECT 3, '2017-01-03', 123, '2017-01-03' UNION 
SELECT 4, '2017-01-04', 123, '2017-01-04' UNION 
SELECT 5, '2017-01-05', 123, '2017-01-05' UNION 
SELECT 6, '2017-01-06', 123, NULL UNION 
SELECT 7, '2017-01-07', 123, NULL UNION 
SELECT 8, '2017-01-08', 123, NULL UNION 
SELECT 9, '2017-01-09', 123, '2017-01-09' UNION 
SELECT 10, '2017-01-10',    123, '2017-01-10' 
;

您的範例數據與您的描述不符,起初讓我感到困惑。正如 sp_BlitzErik 指出的那樣,這是一個孤島和間隙問題。如果您可以訪問視窗函式,則解決方案非常簡單。首先,我們可以單獨列舉每個成員的表,我們稱之為 full_order(這恰好與 day 相同,但為了一般性,我會添加它)。其次,我們可以列舉每個成員的表以及他們當天是否處於活動狀態,我們稱之為 partial_order

select day, active, date, member
     , row_number() over (partition by member 
                          order by day) as fullorder
     , row_number() over (partition by member
                         ,case when active is null then 0 else 1 end
                         order by day) as partialorder
from src

DAY         ACTIVE     MEMBER      FULLORDER            PARTIALORDER        

----------- ---------- ----------- -------------------- --------------------
     1 -                  123                    1                    1
     2 -                  123                    2                    2
     3 01/03/2017         123                    3                    1
     4 01/04/2017         123                    4                    2
     5 01/05/2017         123                    5                    3
     6 -                  123                    6                    3
     7 -                  123                    7                    4
     8 -                  123                    8                    5
     9 01/09/2017         123                    9                    4
    10 01/10/2017         123                   10                    5

如果full_order和partial_order之間的差異發生變化,則意味著active已經從null變為值,反之亦然。因此,我們可以用這種差異組成一個組。在每個這樣的組中,我們可以選擇 min(active) 和 max(active) 來形成一個區間:

select member, grp, min(date), max(active) 
from (
   select day, active, date, member
        , row_number() over (partition by member order by day) 
        - row_number() over (partition by member
                            ,case when active is null then 0 else 1 end 
                             order by day) as grp  
   from src
) 
group by member, grp

MEMBER      GRP                  3          4         
----------- -------------------- ---------- ----------
   123                    0 01/01/2017 -         
   123                    2 01/03/2017 01/05/2017
   123                    3 01/05/2017 -         
   123                    5 01/08/2017 01/10/2017

添加另一層嵌套以獲得所需的結果可能是最簡單的:

select member, min_active
    , coalesce(max_active, min_active) as max_active
    , case when max_active is null then 'INACTIVE' else 'ACTIVE' end as status 
from (
   select member, grp, min(date) as min_active, max(active) as max_active 
   from (
       select day, active, date, member
            , row_number() over (partition by member order by day) 
            - row_number() over (partition by member
                                ,case when active is null then 0 else 1 end 
                                order by day) as grp  
       from src
   ) 
   group by member, grp)

MEMBER      MIN_ACTIVE MAX_ACTIVE STATUS  
----------- ---------- ---------- --------
   123 01/01/2017 01/01/2017 INACTIVE
   123 01/03/2017 01/05/2017 ACTIVE  
   123 01/05/2017 01/05/2017 INACTIVE
   123 01/08/2017 01/10/2017 ACTIVE  

引用自:https://dba.stackexchange.com/questions/191005