Sql-Server

如何計算每組中第一行和最後一行欄位值的差異

  • February 13, 2016

我有這樣的結構表:

+-------+------------------+
| Value |       Date       |
+-------+------------------+
|    10 | 10/10/2010 10:00 |
|    11 | 10/10/2010 10:15 |
|    15 | 10/10/2010 10:30 |
|    15 | 10/10/2010 10:45 |
|    17 | 10/10/2010 11:00 |
|    18 | 10/10/2010 11:15 |
|    22 | 10/10/2010 11:30 |
|    30 | 10/10/2010 11:45 |
+-------+------------------+

目前我正在使用 group by 來獲取 min、max、avg 以獲取每小時報告,如下所示:

+-----+-----+-------+------------------+
| min | max |  avg  |       Date       |
+-----+-----+-------+------------------+
|  10 |  15 | 12.75 | 10/10/2010 10:00 |
|  17 |  30 | 21.75 | 10/10/2010 11:00 |
+-----+-----+-------+------------------+

我如何計算每組中最後一行和第一行值的差異以生成如下內容:

+-----+-----+-------+------+------------------+
| min | max |  avg  | diff |       Date       |
+-----+-----+-------+------+------------------+
|  10 |  15 | 12.75 |    5 | 10/10/2010 10:00 |
|  17 |  30 | 21.75 |   13 | 10/10/2010 11:00 |
+-----+-----+-------+------+------------------+

謝謝。

您沒有顯示用於獲取結果的查詢,而沒有diff. 我假設它是這樣的:

SELECT
 min  = MIN(Value),
 max  = MAX(Value),
 avg  = AVG(Value),  -- or, if Value is an int, like this, perhaps:
                     -- AVG(CAST(Value AS decimal(10,2))
 Date = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
FROM atable
GROUP BY
 DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
;

另外,您沒有解釋第一個最後一個是什麼意思。在這個答案中,假設first代表group 中最早的(根據Date值),同樣,last代表 group 中的最新

一種投入方式diff可能是這樣的:

首先,在原始查詢中再添加兩個聚合列minDatemaxDate

SELECT
 min     = MIN(Value),
 max     = MAX(Value),
 avg     = AVG(Value),
 **minDate = MIN(Date),
maxDate = MAX(Date),**
 Date    = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
FROM atable
GROUP BY
 DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
;

接下來,將聚合結果集連續(單獨)連接回原始表minDatemaxDate訪問相應Value的 s:

SELECT
 g.min,
 g.max,
 g.avg,
 diff = last.Value - first.Value,
 g.Date
FROM (
 SELECT
   min     = MIN(Value),
   max     = MAX(Value),
   avg     = AVG(Value),
   minDate = MIN(Date),
   maxDate = MAX(Date),
   Date    = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
 FROM atable
 GROUP BY
   DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
) g
INNER JOIN atable first ON first.Date = g.minDate
INNER JOIN atable last  ON last .Date = g.maxDate
;

請注意,以上假設Date值(至少在其相應小時內恰好是第一個或最後一個的值)是唯一的,否則您將在輸出中的某些小時內獲得不止一行。

如果您使用的是 SQL Server 2005 或更高版本,則另一種方法是使用視窗聚合函式MIN() OVER (...)MAX() OVER (...)計算Value對應於minDate或的 s maxDate,然後再以類似於您現在可能正在做的方式聚合所有結果。這就是我所說的具體內容:

WITH partitioned AS (
 SELECT
   Value,
   Date,
   GroupDate = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
 FROM atable
)
, firstlast AS (
 SELECT
   Value,
   Date,
   GroupDate,
   FirstValue = CASE Date WHEN MIN(Date) OVER (PARTITION BY GroupDate) THEN Value END,
   LastValue  = CASE Date WHEN MAX(Date) OVER (PARTITION BY GroupDate) THEN Value END
 FROM partitioned
)
SELECT
 min  = MIN(Value),
 max  = MAX(Value),
 avg  = AVG(Value),  -- or, again, if Value is an int, cast it as a decimal or float
 diff = MAX(LastValue) - MIN(FirstValue),
 Date = GroupDate
FROM firstlast
GROUP BY
 GroupDate
;

如您所見,第一個公用表表達式 (CTE)僅返回所有行並添加一個計算列GroupDate,該列隨後用於分組/分區。因此,它本質上只是為分組表達式分配一個名稱,這樣做是為了提高整個查詢的可讀性/可維護性,因為該列後來被多次引用。這是第一個 CTE 產生的:

+-------+------------------+------------------+
| Value |       Date       |    GroupDate     |
+-------+------------------+------------------+
|    10 | 10/10/2010 10:00 | 10/10/2010 10:00 |
|    11 | 10/10/2010 10:15 | 10/10/2010 10:00 |
|    15 | 10/10/2010 10:30 | 10/10/2010 10:00 |
|    15 | 10/10/2010 10:45 | 10/10/2010 10:00 |
|    17 | 10/10/2010 11:00 | 10/10/2010 11:00 |
|    18 | 10/10/2010 11:15 | 10/10/2010 11:00 |
|    22 | 10/10/2010 11:30 | 10/10/2010 11:00 |
|    30 | 10/10/2010 11:45 | 10/10/2010 11:00 |
+-------+------------------+------------------+

第二個 CTE 為上述結果增加了兩列。它使用視窗聚合函式MIN() OVER ...MAX() OVER ...來匹配Date,並且在匹配發生的地方,對應Value的會在單獨的列中返回,或者FirstValueLastValue

+-------+------------------+------------------+------------+-----------+
| Value |       Date       |    GroupDate     | FirstValue | LastValue |
+-------+------------------+------------------+------------+-----------+
|    10 | 10/10/2010 10:00 | 10/10/2010 10:00 |         10 |      NULL |
|    11 | 10/10/2010 10:15 | 10/10/2010 10:00 |       NULL |      NULL |
|    15 | 10/10/2010 10:30 | 10/10/2010 10:00 |       NULL |      NULL |
|    15 | 10/10/2010 10:45 | 10/10/2010 10:00 |       NULL |        15 |
|    17 | 10/10/2010 11:00 | 10/10/2010 11:00 |         17 |      NULL |
|    18 | 10/10/2010 11:15 | 10/10/2010 11:00 |       NULL |      NULL |
|    22 | 10/10/2010 11:30 | 10/10/2010 11:00 |       NULL |      NULL |
|    30 | 10/10/2010 11:45 | 10/10/2010 11:00 |       NULL |        30 |
+-------+------------------+------------------+------------+-----------+

至此,一切準備就緒,可以進行最後的聚合了。、和列將像以前一樣聚合,min現在可以很容易地從聚合的 中減去聚合得到。正如您從上面的結果集中看到的那樣,您可以使用各種函式來獲取和用於組:它可以是, , , – 如果這些都可以的話,因為每個組中只有一個值。max``avg``diff``FirstValue``LastValue``FirstValue``LastValue``MIN``MAX``SUM``AVG

但是,如您所見,主要的 SELECTMAX()一遍LastValue又一MIN()遍地應用FirstValue。那是故意的。這是因為第二個建議並不Date像第一個建議那樣真正要求是唯一的,但是,如果其中一個minDatemaxDate確實有多個關聯Value,它將導致FirstValueLastValue包含每個組多個值,例如這:

+-------+------------------+------------------+------------+-----------+
| Value |       Date       |    GroupDate     | FirstValue | LastValue |
+-------+------------------+------------------+------------+-----------+
|     9 | 10/10/2010 10:00 | 10/10/2010 10:00 |          9 |      NULL |
|    10 | 10/10/2010 10:00 | 10/10/2010 10:00 |         10 |      NULL |
|    11 | 10/10/2010 10:15 | 10/10/2010 10:00 |       NULL |      NULL |
|    15 | 10/10/2010 10:30 | 10/10/2010 10:00 |       NULL |      NULL |
|    15 | 10/10/2010 10:45 | 10/10/2010 10:00 |       NULL |        15 |
|    17 | 10/10/2010 11:00 | 10/10/2010 11:00 |         17 |      NULL |
|    18 | 10/10/2010 11:15 | 10/10/2010 11:00 |       NULL |      NULL |
|    22 | 10/10/2010 11:30 | 10/10/2010 11:00 |       NULL |      NULL |
|    30 | 10/10/2010 11:45 | 10/10/2010 11:00 |       NULL |        30 |
|    33 | 10/10/2010 11:45 | 10/10/2010 11:00 |       NULL |        33 |
+-------+------------------+------------------+------------+-----------+

我假設在這種情況下,取最大的最後一個值和最小的第一個值之間的差異會更自然。但是,您應該更清楚在此處應用什麼規則,因此您只需相應地更改查詢。

您可以在 SQL Fiddle 測試這兩種解決方案:


更新

從 SQL Server 2012 開始,您還可以使用FIRST_VALUELAST_VALUEfirstlast函式並將它們替換為我上面最後一個查詢中 CTE 中的 CASE 表達式,如下所示:

FirstValue = FIRST_VALUE(Value) OVER (PARTITION BY GroupDate ORDER BY Date ASC
                                     ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING),
LastValue  = LAST_VALUE(Value)  OVER (PARTITION BY GroupDate ORDER BY Date ASC
                                     ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)

在這種情況下,無論您是否反複使用 MIN 或 MAX (在主 SELECTFirstValue中)都無關緊要:每一列在同一組的所有行中都LastValue將具有完全相同的值(相應的第一個或最後一個) ,因此並且會返回每種情況下的結果相同。Value``GroupDate``MIN()``MAX()

事實上,您可以diff直接在firstlastCTE 中獲取,然後在主查詢中使用 MIN/MAX 聚合它,或者將其添加到 GROUP BY 並在不聚合的情況下引用它,如下所示:

WITH partitioned AS (
 SELECT
   Value,
   Date,
   GroupDate = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
 FROM atable
)
, firstlast AS (
 SELECT
   Value,
   Date,
   GroupDate,
   diff = LAST_VALUE(Value)  OVER (PARTITION BY GroupDate ORDER BY Date ASC
                                   ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
        - FIRST_VALUE(Value) OVER (PARTITION BY GroupDate ORDER BY Date ASC
                                   ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
 FROM partitioned
)
SELECT
 min  = MIN(Value),
 max  = MAX(Value),
 avg  = AVG(Value),
 diff,
 Date = GroupDate
FROM firstlast
GROUP BY
 GroupDate,
 diff
;

更進一步,您可以獲得min,maxavgin firstlast,而不是主查詢 - 使用對應的視窗函式:

min  = MIN(Value) OVER (PARTITION BY GroupDate),
max  = MAX(Value) OVER (PARTITION BY GroupDate),
avg  = AVG(Value) OVER (PARTITION BY GroupDate),

使用這三個附加列和之前的更改,firstlastCTE 將為您的範例返回這樣的行集:

+-------+------------------+------------------+-----+-----+-------+------+
| Value |       Date       |    GroupDate     | min | max |  avg  | diff |
+-------+------------------+------------------+-----+-----+-------+------+
|    10 | 10/10/2010 10:00 | 10/10/2010 10:00 |  10 |  15 | 12.75 |    5 |
|    11 | 10/10/2010 10:15 | 10/10/2010 10:00 |  10 |  15 | 12.75 |    5 |
|    15 | 10/10/2010 10:30 | 10/10/2010 10:00 |  10 |  15 | 12.75 |    5 |
|    15 | 10/10/2010 10:45 | 10/10/2010 10:00 |  10 |  15 | 12.75 |    5 |
|    17 | 10/10/2010 11:00 | 10/10/2010 11:00 |  17 |  30 | 21.75 |   13 |
|    18 | 10/10/2010 11:15 | 10/10/2010 11:00 |  17 |  30 | 21.75 |   13 |
|    22 | 10/10/2010 11:30 | 10/10/2010 11:00 |  17 |  30 | 21.75 |   13 |
|    30 | 10/10/2010 11:45 | 10/10/2010 11:00 |  17 |  30 | 21.75 |   13 |
+-------+------------------+------------------+-----+-----+-------+------+

請注意GroupDateminmax和—— 最終集真正需要的列avg——如何diff在屬於同一組的所有行中簡單地重複。這意味著您可以擺脫Valueand Date,重命名GroupDateDate,稍微重新排列列,將 DISTINCT 應用於結果集——並且您已經消除了最後一個 SELECT:

WITH partitioned AS (
 SELECT
   Value,
   Date,
   GroupDate = DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)
 FROM
   atable
)
SELECT DISTINCT
 min  = MIN(Value) OVER (PARTITION BY GroupDate),
 max  = MAX(Value) OVER (PARTITION BY GroupDate),
 avg  = AVG(Value) OVER (PARTITION BY GroupDate),
 diff = LAST_VALUE(Value)  OVER (PARTITION BY GroupDate ORDER BY Date ASC
                                 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
      - FIRST_VALUE(Value) OVER (PARTITION BY GroupDate ORDER BY Date ASC
                                 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING),
 Date = GroupDate
FROM
 partitioned
;

最後,也可以將GroupDate計算移到計算 等的同一範圍minmax。您可以為此使用 CROSS APPLY,從而避免完全嵌套查詢的需要——換句話說,這樣您也可以擺脫partitionedCTE。整個查詢如下所示*:*

SELECT DISTINCT
 min  = MIN(t.Value) OVER (PARTITION BY x.GroupDate),
 max  = MAX(t.Value) OVER (PARTITION BY x.GroupDate),
 avg  = AVG(t.Value) OVER (PARTITION BY x.GroupDate),
 diff = LAST_VALUE(t.Value)  OVER (PARTITION BY x.GroupDate ORDER BY t.Date ASC
                                   ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
      - FIRST_VALUE(t.Value) OVER (PARTITION BY x.GroupDate ORDER BY t.Date ASC
                                   ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING),
 Date = x.GroupDate
FROM
 atable AS t
 CROSS APPLY (SELECT DATEADD(HOUR, DATEDIFF(HOUR, 0, Date), 0)) AS x (GroupDate)
;

並返回相同的結果。您也可以在 SQL Fiddle上對其進行測試。

引用自:https://dba.stackexchange.com/questions/28482