如何計算每組中第一行和最後一行欄位值的差異
我有這樣的結構表:
+-------+------------------+ | Value | Date | +-------+------------------+ | 10 | 10/10/2010 10:00 | | 11 | 10/10/2010 10:15 | | 15 | 10/10/2010 10:30 | | 15 | 10/10/2010 10:45 | | 17 | 10/10/2010 11:00 | | 18 | 10/10/2010 11:15 | | 22 | 10/10/2010 11:30 | | 30 | 10/10/2010 11:45 | +-------+------------------+
目前我正在使用 group by 來獲取 min、max、avg 以獲取每小時報告,如下所示:
+-----+-----+-------+------------------+ | min | max | avg | Date | +-----+-----+-------+------------------+ | 10 | 15 | 12.75 | 10/10/2010 10:00 | | 17 | 30 | 21.75 | 10/10/2010 11:00 | +-----+-----+-------+------------------+
我如何計算每組中最後一行和第一行值的差異以生成如下內容:
+-----+-----+-------+------+------------------+ | min | max | avg | diff | Date | +-----+-----+-------+------+------------------+ | 10 | 15 | 12.75 | 5 | 10/10/2010 10:00 | | 17 | 30 | 21.75 | 13 | 10/10/2010 11:00 | +-----+-----+-------+------+------------------+
謝謝。
您沒有顯示用於獲取結果的查詢,而沒有
diff
. 我假設它是這樣的:SELECT min = MIN(Value), max = MAX(Value), avg = AVG(Value), -- or, if Value is an int, like this, perhaps: -- AVG(CAST(Value AS decimal(10,2)) Date = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0) FROM atable GROUP BY DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0) ;
另外,您沒有解釋第一個和最後一個是什麼意思。在這個答案中,假設first代表group 中最早的(根據
Date
值),同樣,last代表 group 中的最新。一種投入方式
diff
可能是這樣的:首先,在原始查詢中再添加兩個聚合列
minDate
和maxDate
:SELECT min = MIN(Value), max = MAX(Value), avg = AVG(Value), **minDate = MIN(Date), maxDate = MAX(Date),** Date = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0) FROM atable GROUP BY DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0) ;
接下來,將聚合結果集連續(單獨)連接回原始表
minDate
以maxDate
訪問相應Value
的 s:SELECT g.min, g.max, g.avg, diff = last.Value - first.Value, g.Date FROM ( SELECT min = MIN(Value), max = MAX(Value), avg = AVG(Value), minDate = MIN(Date), maxDate = MAX(Date), Date = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0) FROM atable GROUP BY DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0) ) g INNER JOIN atable first ON first.Date = g.minDate INNER JOIN atable last ON last .Date = g.maxDate ;
請注意,以上假設
Date
值(至少在其相應小時內恰好是第一個或最後一個的值)是唯一的,否則您將在輸出中的某些小時內獲得不止一行。如果您使用的是 SQL Server 2005 或更高版本,則另一種方法是使用視窗聚合函式
MIN() OVER (...)
併MAX() OVER (...)
計算Value
對應於minDate
或的 smaxDate
,然後再以類似於您現在可能正在做的方式聚合所有結果。這就是我所說的具體內容:WITH partitioned AS ( SELECT Value, Date, GroupDate = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0) FROM atable ) , firstlast AS ( SELECT Value, Date, GroupDate, FirstValue = CASE Date WHEN MIN(Date) OVER (PARTITION BY GroupDate) THEN Value END, LastValue = CASE Date WHEN MAX(Date) OVER (PARTITION BY GroupDate) THEN Value END FROM partitioned ) SELECT min = MIN(Value), max = MAX(Value), avg = AVG(Value), -- or, again, if Value is an int, cast it as a decimal or float diff = MAX(LastValue) - MIN(FirstValue), Date = GroupDate FROM firstlast GROUP BY GroupDate ;
如您所見,第一個公用表表達式 (CTE)僅返回所有行並添加一個計算列
GroupDate
,該列隨後用於分組/分區。因此,它本質上只是為分組表達式分配一個名稱,這樣做是為了提高整個查詢的可讀性/可維護性,因為該列後來被多次引用。這是第一個 CTE 產生的:+-------+------------------+------------------+ | Value | Date | GroupDate | +-------+------------------+------------------+ | 10 | 10/10/2010 10:00 | 10/10/2010 10:00 | | 11 | 10/10/2010 10:15 | 10/10/2010 10:00 | | 15 | 10/10/2010 10:30 | 10/10/2010 10:00 | | 15 | 10/10/2010 10:45 | 10/10/2010 10:00 | | 17 | 10/10/2010 11:00 | 10/10/2010 11:00 | | 18 | 10/10/2010 11:15 | 10/10/2010 11:00 | | 22 | 10/10/2010 11:30 | 10/10/2010 11:00 | | 30 | 10/10/2010 11:45 | 10/10/2010 11:00 | +-------+------------------+------------------+
第二個 CTE 為上述結果增加了兩列。它使用視窗聚合函式
MIN() OVER ...
和MAX() OVER ...
來匹配Date
,並且在匹配發生的地方,對應Value
的會在單獨的列中返回,或者FirstValue
或LastValue
:+-------+------------------+------------------+------------+-----------+ | Value | Date | GroupDate | FirstValue | LastValue | +-------+------------------+------------------+------------+-----------+ | 10 | 10/10/2010 10:00 | 10/10/2010 10:00 | 10 | NULL | | 11 | 10/10/2010 10:15 | 10/10/2010 10:00 | NULL | NULL | | 15 | 10/10/2010 10:30 | 10/10/2010 10:00 | NULL | NULL | | 15 | 10/10/2010 10:45 | 10/10/2010 10:00 | NULL | 15 | | 17 | 10/10/2010 11:00 | 10/10/2010 11:00 | 17 | NULL | | 18 | 10/10/2010 11:15 | 10/10/2010 11:00 | NULL | NULL | | 22 | 10/10/2010 11:30 | 10/10/2010 11:00 | NULL | NULL | | 30 | 10/10/2010 11:45 | 10/10/2010 11:00 | NULL | 30 | +-------+------------------+------------------+------------+-----------+
至此,一切準備就緒,可以進行最後的聚合了。、和列將像以前一樣聚合,
min
現在可以很容易地從聚合的 中減去聚合得到。正如您從上面的結果集中看到的那樣,您可以使用各種函式來獲取和用於組:它可以是, , , – 如果這些都可以的話,因為每個組中只有一個值。max``avg``diff``FirstValue``LastValue``FirstValue``LastValue``MIN``MAX``SUM``AVG
但是,如您所見,主要的 SELECT
MAX()
一遍LastValue
又一MIN()
遍地應用FirstValue
。那是故意的。這是因為第二個建議並不Date
像第一個建議那樣真正要求是唯一的,但是,如果其中一個minDate
或maxDate
確實有多個關聯Value
,它將導致FirstValue
或LastValue
包含每個組多個值,例如這:+-------+------------------+------------------+------------+-----------+ | Value | Date | GroupDate | FirstValue | LastValue | +-------+------------------+------------------+------------+-----------+ | 9 | 10/10/2010 10:00 | 10/10/2010 10:00 | 9 | NULL | | 10 | 10/10/2010 10:00 | 10/10/2010 10:00 | 10 | NULL | | 11 | 10/10/2010 10:15 | 10/10/2010 10:00 | NULL | NULL | | 15 | 10/10/2010 10:30 | 10/10/2010 10:00 | NULL | NULL | | 15 | 10/10/2010 10:45 | 10/10/2010 10:00 | NULL | 15 | | 17 | 10/10/2010 11:00 | 10/10/2010 11:00 | 17 | NULL | | 18 | 10/10/2010 11:15 | 10/10/2010 11:00 | NULL | NULL | | 22 | 10/10/2010 11:30 | 10/10/2010 11:00 | NULL | NULL | | 30 | 10/10/2010 11:45 | 10/10/2010 11:00 | NULL | 30 | | 33 | 10/10/2010 11:45 | 10/10/2010 11:00 | NULL | 33 | +-------+------------------+------------------+------------+-----------+
我假設在這種情況下,取最大的最後一個值和最小的第一個值之間的差異會更自然。但是,您應該更清楚在此處應用什麼規則,因此您只需相應地更改查詢。
您可以在 SQL Fiddle 測試這兩種解決方案:
更新
從 SQL Server 2012 開始,您還可以使用FIRST_VALUE和LAST_VALUE
firstlast
函式並將它們替換為我上面最後一個查詢中 CTE 中的 CASE 表達式,如下所示:FirstValue = FIRST_VALUE(Value) OVER (PARTITION BY GroupDate ORDER BY Date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), LastValue = LAST_VALUE(Value) OVER (PARTITION BY GroupDate ORDER BY Date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
在這種情況下,無論您是否反複使用 MIN 或 MAX (在主 SELECT
FirstValue
中)都無關緊要:每一列在同一組的所有行中都LastValue
將具有完全相同的值(相應的第一個或最後一個) ,因此並且會返回每種情況下的結果相同。Value``GroupDate``MIN()``MAX()
事實上,您可以
diff
直接在firstlast
CTE 中獲取,然後在主查詢中使用 MIN/MAX 聚合它,或者將其添加到 GROUP BY 並在不聚合的情況下引用它,如下所示:WITH partitioned AS ( SELECT Value, Date, GroupDate = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0) FROM atable ) , firstlast AS ( SELECT Value, Date, GroupDate, diff = LAST_VALUE(Value) OVER (PARTITION BY GroupDate ORDER BY Date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) - FIRST_VALUE(Value) OVER (PARTITION BY GroupDate ORDER BY Date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) FROM partitioned ) SELECT min = MIN(Value), max = MAX(Value), avg = AVG(Value), diff, Date = GroupDate FROM firstlast GROUP BY GroupDate, diff ;
更進一步,您可以獲得
min
,max
和avg
infirstlast
,而不是主查詢 - 使用對應的視窗函式:min = MIN(Value) OVER (PARTITION BY GroupDate), max = MAX(Value) OVER (PARTITION BY GroupDate), avg = AVG(Value) OVER (PARTITION BY GroupDate),
使用這三個附加列和之前的更改,
firstlast
CTE 將為您的範例返回這樣的行集:+-------+------------------+------------------+-----+-----+-------+------+ | Value | Date | GroupDate | min | max | avg | diff | +-------+------------------+------------------+-----+-----+-------+------+ | 10 | 10/10/2010 10:00 | 10/10/2010 10:00 | 10 | 15 | 12.75 | 5 | | 11 | 10/10/2010 10:15 | 10/10/2010 10:00 | 10 | 15 | 12.75 | 5 | | 15 | 10/10/2010 10:30 | 10/10/2010 10:00 | 10 | 15 | 12.75 | 5 | | 15 | 10/10/2010 10:45 | 10/10/2010 10:00 | 10 | 15 | 12.75 | 5 | | 17 | 10/10/2010 11:00 | 10/10/2010 11:00 | 17 | 30 | 21.75 | 13 | | 18 | 10/10/2010 11:15 | 10/10/2010 11:00 | 17 | 30 | 21.75 | 13 | | 22 | 10/10/2010 11:30 | 10/10/2010 11:00 | 17 | 30 | 21.75 | 13 | | 30 | 10/10/2010 11:45 | 10/10/2010 11:00 | 17 | 30 | 21.75 | 13 | +-------+------------------+------------------+-----+-----+-------+------+
請注意
GroupDate
、min
、max
和—— 最終集真正需要的列avg
——如何diff
在屬於同一組的所有行中簡單地重複。這意味著您可以擺脫Value
andDate
,重命名GroupDate
為Date
,稍微重新排列列,將 DISTINCT 應用於結果集——並且您已經消除了最後一個 SELECT:WITH partitioned AS ( SELECT Value, Date, GroupDate = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0) FROM atable ) SELECT DISTINCT min = MIN(Value) OVER (PARTITION BY GroupDate), max = MAX(Value) OVER (PARTITION BY GroupDate), avg = AVG(Value) OVER (PARTITION BY GroupDate), diff = LAST_VALUE(Value) OVER (PARTITION BY GroupDate ORDER BY Date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) - FIRST_VALUE(Value) OVER (PARTITION BY GroupDate ORDER BY Date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), Date = GroupDate FROM partitioned ;
最後,也可以將
GroupDate
計算移到計算 等的同一範圍min
內max
。您可以為此使用 CROSS APPLY,從而避免完全嵌套查詢的需要——換句話說,這樣您也可以擺脫partitioned
CTE。整個查詢如下所示*:*SELECT DISTINCT min = MIN(t.Value) OVER (PARTITION BY x.GroupDate), max = MAX(t.Value) OVER (PARTITION BY x.GroupDate), avg = AVG(t.Value) OVER (PARTITION BY x.GroupDate), diff = LAST_VALUE(t.Value) OVER (PARTITION BY x.GroupDate ORDER BY t.Date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) - FIRST_VALUE(t.Value) OVER (PARTITION BY x.GroupDate ORDER BY t.Date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), Date = x.GroupDate FROM atable AS t CROSS APPLY (SELECT DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)) AS x (GroupDate) ;
並返回相同的結果。您也可以在 SQL Fiddle上對其進行測試。