Mariadb

沒有組鍵時如何聚合相關行?

  • October 1, 2021

我正在使用 MariaDB 10.6 並有一個表:

CREATE TABLE velocities (
 `state` int(8) NOT NULL,
 `timestamp` datetime NOT NULL,
 `velocity` decimal(5,2) NOT NULL,
 `name` varchar(15) NOT NULL
) DEFAULT CHARSET=utf8;

每行描述一個對像在特定時間戳的速度。多行一起描述一個時期。該state列可以是 -1、-2、-3 或任何大於 0 的數字。如果state是 -1,則表示它是周期的開始,如果是 -2,則表示我們處於週期的中間,如果是 -3這意味著它是一段時間內的最後一個時間戳。如果state是大於 0 的數字,則表示已為期間分配了一個 id。該表中有多個句點,它們可能在不同名稱之間重疊,但不與相同名稱重疊。

現在我想做一個查詢,每個週期返回一行start_time, , end_time, max_velocity_time, velocity_at_start, velocity_at_end, max_velocity

範例數據

INSERT INTO velocities (state, timestamp, velocity, name) VALUES
   (-2, "2021-01-01 00:00:01", 2, "FOO");
INSERT INTO velocities (state, timestamp, velocity, name) VALUES
   (-2, "2021-01-01 00:00:02", 3, "FOO");
INSERT INTO velocities (state, timestamp, velocity, name) VALUES
   (-2, "2021-01-01 00:00:03", 2, "FOO");    
   
INSERT INTO velocities (state, timestamp, velocity, name) VALUES
   (-1, "2021-01-01 00:00:00", 3, "BAZ");
INSERT INTO velocities (state, timestamp, velocity, name) VALUES
   (-2, "2021-01-01 00:00:01", 4, "BAZ");
INSERT INTO velocities (state, timestamp, velocity, name) VALUES
   (-2, "2021-01-01 00:00:02", 5, "BAZ");
INSERT INTO velocities (state, timestamp, velocity, name) VALUES
   (-2, "2021-01-01 00:00:03", 6, "BAZ");
INSERT INTO velocities (state, timestamp, velocity, name) VALUES
   (-3, "2021-01-01 00:00:04", 2, "BAZ");

INSERT INTO velocities (state, timestamp, velocity, name) VALUES
   (-1, "2021-01-01 00:00:02", 4, "BAR");
INSERT INTO velocities (state, timestamp, velocity, name) VALUES
   (-2, "2021-01-01 00:00:03", 7, "BAR");
INSERT INTO velocities (state, timestamp, velocity, name) VALUES
   (-2, "2021-01-01 00:00:04", 8, "BAR");
INSERT INTO velocities (state, timestamp, velocity, name) VALUES
   (-2, "2021-01-01 00:00:05", 10, "BAR");
INSERT INTO velocities (state, timestamp, velocity, name) VALUES
   (-3, "2021-01-01 00:00:06", 2, "BAR");
    
INSERT INTO velocities (state, timestamp, velocity, name) VALUES
   (42, "2021-01-01 00:00:07", 5, "BAR");
INSERT INTO velocities (state, timestamp, velocity, name) VALUES
   (42, "2021-01-01 00:00:08", 7, "BAR");
INSERT INTO velocities (state, timestamp, velocity, name) VALUES
   (42, "2021-01-01 00:00:09", 10, "BAR");
INSERT INTO velocities (state, timestamp, velocity, name) VALUES
   (42, "2021-01-01 00:00:10", 17, "BAR");
INSERT INTO velocities (state, timestamp, velocity, name) VALUES
   (42, "2021-01-01 00:00:11", 2, "BAR");

鑑於此數據,我希望結果如下所示:

第一次嘗試

我的第一次嘗試從https://stackoverflow.com/questions/1136597/group-by-for-continuous-rows-in-sql獲得靈感,我讓它在 MySQL 8.0 中工作,但在 MariaDB 10.6 中不起作用我需要它。我得到的查詢是:

WITH cte AS (
 SELECT
   @r := @r
     + (
       CASE 
         WHEN @state > 0 THEN @state != v.state
         WHEN @state - v.state < 0 THEN 1
         ELSE 0
       END
     ) AS id,
   @state := state AS _,
   v.name,
   v.timestamp,
   v.velocity,
   (CASE WHEN v.state > 0 THEN v.state ELSE NULL END) AS booked_id
 FROM (SELECT @r := 0, @state := 0) vars, velocities v
 ORDER BY v.name, v.timestamp
),
inner_max_velocity_cte AS (SELECT id, MAX(velocity) AS velocity FROM cte GROUP BY id),
max_velocity_cte AS (
 SELECT cte.id, cte.timestamp, cte.velocity
 FROM cte
 INNER JOIN inner_max_velocity_cte x ON cte.id = x.id AND cte.velocity = x.velocity
),
inner_end_time_cte AS (SELECT id, MAX(timestamp) AS timestamp FROM cte GROUP BY id),
end_time_cte AS (
 SELECT cte.id, cte.timestamp, cte.velocity
 FROM cte
 INNER JOIN inner_end_time_cte x ON cte.id = x.id AND cte.timestamp = x.timestamp
)
SELECT
 cte.booked_id,
 cte.name,
 cte.timestamp AS start_time,
 end_time_cte.timestamp AS end_time,
 max_velocity_cte.timestamp AS max_velocity_time,
 cte.velocity AS start_time_velocity,
 end_time_cte.velocity AS end_time_velocity,
 max_velocity_cte.velocity AS max_velocity
FROM cte
 LEFT OUTER JOIN max_velocity_cte ON max_velocity_cte.id = cte.id
 LEFT OUTER JOIN end_time_cte ON end_time_cte.id = cte.id
GROUP BY cte.id
ORDER BY start_time

有人建議我可以用視窗函式來做到這一點,但我不知道如何讓它工作?

這是一個典型的差距和孤島問題。

有很多解決方案。標準方法是為每個部分定義起點或終點,然後使用視窗條件COUNT對島嶼進行編號。然後,您只需按此分組編號進行分組。

這裡有額外的並發症

  • 我們需要檢查前一行是否為負,但這不是
  • 您有一個不以 開頭的島-1,因此我們還需要檢查null以前的值
  • 我們需要使用更多的視窗函式來獲得每組的最終值。為了避免額外的排序,我已經避免了多個行號,所以我們使用LEADandLAG為此
WITH PrevValues AS (
   SELECT *,
     LAG(state) OVER (PARTITION BY name ORDER BY timestamp) AS prevValue
   FROM velocities
),
Groupings AS (
   SELECT *,
     COUNT(CASE WHEN state = -1 OR prevValue IS NULL OR (prevValue < 0 AND state >= 0) THEN 1 END)
       OVER (PARTITION BY name ORDER BY timestamp ROWS UNBOUNDED PRECEDING) AS GroupId
   FROM PrevValues
),
PerGroup AS (
   SELECT *,
     IFNULL(LAG(GroupId) OVER (PARTITION BY name ORDER BY timestamp), -1) AS prevGroup,
     IFNULL(LEAD(GroupId) OVER (PARTITION BY name ORDER BY timestamp), -1) AS nextGroup,
     ROW_NUMBER() OVER (PARTITION BY name, GroupId ORDER BY velocity DESC) AS rnVelocity
   FROM Groupings
)
SELECT
 (CASE WHEN state > 0 THEN state END) AS booked_id,
 name,
 MIN(timestamp) AS start_time,
 MAX(timestamp) AS end_time,
 MIN(CASE WHEN rnVelocity = 1 THEN timestamp END) AS max_velocity_time,
 MIN(CASE WHEN prevGroup <> GroupId THEN velocity END) AS start_time_velocity,
 MIN(CASE WHEN nextGroup <> GroupId THEN velocity END) AS end_time_velocity,
 MAX(velocity) AS max_velocity
FROM PerGroup
GROUP BY
 name,
 GroupId;

db<>小提琴

引用自:https://dba.stackexchange.com/questions/300419