Sql-Server

在完全外連接中對多個列進行排序,可能包含 NULL

  • October 20, 2017

我正在嘗試將計劃與歷史記錄進行比較,以查看是否省略了步驟或是否包含了計劃外的步驟。正常的、計劃好的歷史項目以及異常情況旨在用於詳細報告。問題是,我在每個連接表中都有一列我需要進行排序,但由於full outer join我不能保證它們是not null,這會破壞排序。我將繼續一個簡化的範例,其中計劃對應於一個食譜:

Table: RecipeSteps

Id | RecipeId | Position | Message
----+----------+----------+----------------------------
50 |      123 |        1 | Wash vegetables
51 |      123 |        2 | Peel vegetables
52 |      123 |        3 | Cut meat
53 |      123 |        4 | Turn on stove
54 |      123 |        5 | Cook and stir it
55 |      123 |        6 | Turn off stove
----+----------+----------+----------------------------

…以及烹飪時實際步驟的歷史:

Table: History

Id | TimeStamp | Session | StepId | Message
----+-----------+---------+--------+-----------------------------
90 |  00:01:00 |     321 |   NULL | Chef entered kitchen
91 |  00:02:00 |     321 |     51 | Chef peeled vegetables
92 |  00:03:00 |     321 |     52 | Chef cut meat
93 |  00:04:00 |     321 |   NULL | Chef picked his nose
94 |  00:05:00 |     321 |     53 | Chef turned on stove
95 |  00:06:00 |     321 |     54 | Chef started cooking
96 |  00:10:00 |     321 |   NULL | Chef left kitchen
97 |  01:00:00 |     321 |   NULL | FIRE!
----+-----------+---------+--------+-----------------------------

現在,我想將結果表中的這些表與以下full outer join類似的表進行比較:

Table: Result

Sorting | TimeStamp | Position | Message                | Comment
--------+-----------+----------+------------------------+---------------
      1 |  00:01:00 |     NULL | Chef entered kitchen   | unplanned
      2 |      NULL |        1 | Wash vegetables        | omitted
      3 |  00:02:00 |        2 | Chef peeled vegetables | planned
      4 |  00:03:00 |        3 | Chef cut meat          | planned
      5 |  00:04:00 |     NULL | Chef picked his nose   | unplanned
      6 |  00:05:00 |        4 | Chef turned on stove   | planned
      7 |  00:06:00 |        5 | Chef started cooking   | planned
      8 |      NULL |        6 | Turn off stove         | omitted
      9 |  00:10:00 |     NULL | Chef left kitchen      | unplanned
     10 |  01:00:00 |     NULL | FIRE!                  | unplanned
--------+-----------+----------+------------------------+---------------

然後報告的讀者會看到:

  • 進入/離開廚房不是食譜的一部分,但無論如何都被記錄下來了。
  • 廚師忘記洗蔬菜(呃!)或由於某種原因沒有記錄步驟(呸!)。
  • 歷史按計劃繼續,除了廚師在兩者之間挖了鼻子(雙重呃!)
  • 最後廚師忘記關爐子,導致著火。

我想出了以下SELECT語句,從這個查詢中選擇ORDER BY Sorting ASC,但我仍然有一個問題來填寫Sorting標準:

SELECT
 0 Sorting, -- this has to be replaced
 h.TimeStamp,
 r.Position,
 '...' Message, -- case-when-construct, not important for the question
 '...' Comment -- case-when-construct, not important for the question
FROM RecipeSteps r
FULL OUTER JOIN History h
ON h.StepId = r.Id
WHERE (r.RecipeId = 123 OR r.RecipeId IS NULL)
AND (h.Session = 321 OR h.Session IS NULL)

時間戳給出的歷史和列給出的配方中有一個自然順序Position。由於它們永遠不會null同時存在,因此我可以基於這些創建Sorting列,但到目前為止,我無法弄清楚如何實現這一點。

我知道我可以以程式方式分別對兩者進行排序、列舉和比較兩個序列等,但我很好奇這是否也適用於非過程 SQL。

請注意:為了簡單起見,下面的查詢沒有考慮分區Session

主要思想是將數據集分成兩組,分別處理。History先走。我決定讓History比 更重要Steps。如果交換了某些步驟,它們將按歷史順序顯示。

History因此,第一組是來自相應步驟的所有行加上可能的詳細資訊。這是一個簡單的LEFT JOIN.

第二組省略了步驟。另一個LEFT JOIN帶過濾器。

如果您按時間戳執行第一個CTE_History排序,您會看到計劃外事件有NULL Position,但由於它們的時間戳,它們將在結果集中位於正確的位置。我們需要做的就是填寫這些空白。

+-------------+---------------------+----------+------------------------+-----------+
| NewPosition |         ts          | Position |       NewMessage       |  Comment  |
+-------------+---------------------+----------+------------------------+-----------+
|           0 | 2017-01-01 00:01:00 | NULL     | Chef entered kitchen   | unplanned |
|           2 | 2017-01-01 00:02:00 | 2        | Chef peeled vegetables | planned   |
|           3 | 2017-01-01 00:03:00 | 3        | Chef cut meat          | planned   |
|           3 | 2017-01-01 00:04:00 | NULL     | Chef picked his nose   | unplanned |
|           4 | 2017-01-01 00:05:00 | 4        | Chef turned on stove   | planned   |
|           5 | 2017-01-01 00:06:00 | 5        | Chef started cooking   | planned   |
|           5 | 2017-01-01 00:10:00 | NULL     | Chef left kitchen      | unplanned |
|           5 | 2017-01-01 01:00:00 | NULL     | FIRE!                  | unplanned |
+-------------+---------------------+----------+------------------------+-----------+

一種方法是從前幾行中獲取最後一個非空值。不幸的是,SQL Server 沒有IGNORE NULLS在函式中實現子句LAST_VALUE,所以我們必須使用一種變通方法並依賴於隨著增長而增長MAX的事實。在更一般的情況下,它會更複雜一些,正如 Itzik Ben-Gan 在他的 The Last non NULL Puzzle中所展示的那樣。Position``ts

第一個NULL更改為0(或小於任何可能位置的某個數字)。

一旦我們有了所有位置的值,將這個結果與所有省略的步驟合併並對其進行排序就很簡單了。

在這種方法中,所有省略的步驟都會在不明確的情況下出現在計劃外的步驟之後,因此Turn off stove將出現在最後。如果您選擇計算MIN後面的行而不是MAX前面的行,則該規則將被交換。

樣本數據

DECLARE @RecipeSteps TABLE (Id int, RecipeId int, Position int, Message nvarchar(255));
INSERT INTO @RecipeSteps (Id, RecipeId, Position, Message) VALUES
(50, 123, 1, 'Wash vegetables '),
(51, 123, 2, 'Peel vegetables '), --
(52, 123, 3, 'Cut meat        '), --
(53, 123, 4, 'Turn on stove   '), --
(54, 123, 5, 'Cook and stir it'), --
(55, 123, 6, 'Turn off stove  ');

DECLARE @History TABLE (Id int, ts datetime2(0), Session int, StepId int, Message nvarchar(255));
INSERT INTO @History (Id, ts, Session, StepId, Message) VALUES
(90, '2017-01-01 00:01:00', 321, NULL, 'Chef entered kitchen  '),
(91, '2017-01-01 00:02:00', 321,   51, 'Chef peeled vegetables'),
(92, '2017-01-01 00:03:00', 321,   52, 'Chef cut meat         '),
(93, '2017-01-01 00:04:00', 321, NULL, 'Chef picked his nose  '),
(94, '2017-01-01 00:05:00', 321,   53, 'Chef turned on stove  '),
(95, '2017-01-01 00:06:00', 321,   54, 'Chef started cooking  '),
(96, '2017-01-01 00:10:00', 321, NULL, 'Chef left kitchen     '),
(97, '2017-01-01 01:00:00', 321, NULL, 'FIRE!                 ');

詢問

WITH
CTE_History
AS
(
   SELECT
       ISNULL(
           MAX(Position) 
           OVER (ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
           , 0) AS NewPosition
       ,H.ts
       ,S.Position
       ,H.Message AS NewMessage
       ,CASE WHEN S.Id IS NULL THEN 'unplanned' ELSE 'planned' END AS Comment
   FROM
       @History AS H
       LEFT JOIN @RecipeSteps AS S ON S.Id = H.StepId
)
,CTE_OmittedSteps
AS
(
   SELECT
       Position AS NewPosition
       ,H.ts
       ,S.Position
       ,S.Message AS NewMessage
       ,'omitted' AS Comment
   FROM
       @RecipeSteps AS S
       LEFT JOIN @History AS H ON S.Id = H.StepId
   WHERE
       H.Id IS NULL
)
SELECT * FROM CTE_History
UNION ALL
SELECT * FROM CTE_OmittedSteps
ORDER BY NewPosition, ts;

結果

+-------------+---------------------+----------+------------------------+-----------+
| NewPosition |         ts          | Position |       NewMessage       |  Comment  |
+-------------+---------------------+----------+------------------------+-----------+
|           0 | 2017-01-01 00:01:00 | NULL     | Chef entered kitchen   | unplanned |
|           1 | NULL                | 1        | Wash vegetables        | omitted   |
|           2 | 2017-01-01 00:02:00 | 2        | Chef peeled vegetables | planned   |
|           3 | 2017-01-01 00:03:00 | 3        | Chef cut meat          | planned   |
|           3 | 2017-01-01 00:04:00 | NULL     | Chef picked his nose   | unplanned |
|           4 | 2017-01-01 00:05:00 | 4        | Chef turned on stove   | planned   |
|           5 | 2017-01-01 00:06:00 | 5        | Chef started cooking   | planned   |
|           5 | 2017-01-01 00:10:00 | NULL     | Chef left kitchen      | unplanned |
|           5 | 2017-01-01 01:00:00 | NULL     | FIRE!                  | unplanned |
|           6 | NULL                | 6        | Turn off stove         | omitted   |
+-------------+---------------------+----------+------------------------+-----------+

查詢 SQL Server 2008

MAX在這裡,我用我放入的子查詢替換了視窗函式OUTER APPLY。它產生與上面相同的結果。

WITH
CTE_HistoryRaw
AS
(
   SELECT
       H.ts
       ,S.Position
       ,H.Message AS NewMessage
       ,CASE WHEN S.Id IS NULL THEN 'unplanned' ELSE 'planned' END AS Comment
   FROM
       @History AS H
       LEFT JOIN @RecipeSteps AS S ON S.Id = H.StepId
)
,CTE_History
AS
(
   SELECT
       ISNULL(A.NewPosition, 0) AS NewPosition
       ,ts
       ,Position
       ,NewMessage
       ,Comment
   FROM
       CTE_HistoryRaw AS Curr
       OUTER APPLY
       (
           SELECT TOP(1)
               Prev.Position AS NewPosition
           FROM CTE_HistoryRaw AS Prev
           WHERE
               Prev.ts <= Curr.ts
               AND Prev.Position IS NOT NULL
           ORDER BY Prev.ts DESC
       ) AS A
)
,CTE_OmittedSteps
AS
(
   SELECT
       Position AS NewPosition
       ,H.ts
       ,S.Position
       ,S.Message AS NewMessage
       ,'omitted' AS Comment
   FROM
       @RecipeSteps AS S
       LEFT JOIN @History AS H ON S.Id = H.StepId
   WHERE
       H.Id IS NULL
)
SELECT * FROM CTE_History
UNION ALL
SELECT * FROM CTE_OmittedSteps
ORDER BY NewPosition, ts;

引用自:https://dba.stackexchange.com/questions/188858