Mysql

時間序列數據 - 每日聚合查詢的問題

  • January 13, 2022

我正在嘗試使用每小時時間序列數據查詢 MySQL 數據庫,但我堅持使用(每日)聚合。

我的原始數據:

mysql> SELECT * FROM emoncms.custom_registers 
   -> WHERE timestamp >= '2016-08-24' AND timestamp < '2016-08-25';


+-----+------+---------------------+----------------+
| id  | node | timestamp           | register_value |
+-----+------+---------------------+----------------+
| 299 |   10 | 2016-08-24 00:00:03 |        869.458 |
| 300 |   10 | 2016-08-24 01:00:00 |        869.825 |
| 301 |   10 | 2016-08-24 02:00:03 |        870.202 |
| 302 |   10 | 2016-08-24 03:00:01 |        870.616 |
| 303 |   10 | 2016-08-24 04:00:05 |        870.977 |
| 304 |   10 | 2016-08-24 05:00:03 |        871.376 |
| 305 |   10 | 2016-08-24 06:00:01 |        871.759 |
| 306 |   10 | 2016-08-24 07:00:00 |        872.156 |
| 307 |   10 | 2016-08-24 08:00:04 |        872.629 |
| 308 |   10 | 2016-08-24 09:00:02 |        873.038 |
| 309 |   10 | 2016-08-24 10:00:01 |        873.512 |
| 310 |   10 | 2016-08-24 11:00:01 |        874.009 |
| 311 |   10 | 2016-08-24 12:00:01 |        874.429 |
| 312 |   10 | 2016-08-24 13:00:00 |        874.894 |
| 313 |   10 | 2016-08-24 14:00:04 |        875.331 |
| 314 |   10 | 2016-08-24 15:00:04 |        876.715 |
| 315 |   10 | 2016-08-24 16:00:04 |        877.249 |
| 316 |   10 | 2016-08-24 17:00:04 |        878.672 |
| 317 |   10 | 2016-08-24 18:00:04 |         879.19 |
| 318 |   10 | 2016-08-24 19:00:04 |        879.603 |
| 319 |   10 | 2016-08-24 20:00:04 |        880.027 |
| 320 |   10 | 2016-08-24 21:00:04 |        880.386 |
| 321 |   10 | 2016-08-24 22:00:05 |        880.787 |
| 322 |   10 | 2016-08-24 23:00:00 |        881.218 |
+-----+------+---------------------+----------------+
24 rows in set (0.00 sec)

查詢區間差異時:

mysql> SELECT t2.node, 
   -> ROUND(MIN(t2.register_value - t1.register_value),3) AS register_diff, 
   -> t1.timestamp AS start_interval, t2.timestamp AS stop_interval 
   -> FROM custom_registers AS t1 INNER JOIN custom_registers AS t2 
   -> on t1.timestamp < t2.timestamp WHERE t1.node = t2.node  
   -> AND t1.timestamp >= '2016-08-24 00:00:00'  
   -> AND t1.timestamp < '2016-08-25 00:00:00'  
   -> GROUP BY t1.id ORDER BY t1.timestamp;

+------+---------------+---------------------+---------------------+
| node | register_diff | start_interval      | stop_interval       |
+------+---------------+---------------------+---------------------+
|   10 |         0.367 | 2016-08-24 00:00:03 | 2016-08-24 01:00:00 |
|   10 |         0.377 | 2016-08-24 01:00:00 | 2016-08-24 02:00:03 |
|   10 |         0.414 | 2016-08-24 02:00:03 | 2016-08-24 03:00:01 |
|   10 |         0.361 | 2016-08-24 03:00:01 | 2016-08-24 04:00:05 |
|   10 |         0.399 | 2016-08-24 04:00:05 | 2016-08-24 05:00:03 |
|   10 |         0.383 | 2016-08-24 05:00:03 | 2016-08-24 06:00:01 |
|   10 |         0.397 | 2016-08-24 06:00:01 | 2016-08-24 07:00:00 |
|   10 |         0.473 | 2016-08-24 07:00:00 | 2016-08-24 08:00:04 |
|   10 |         0.409 | 2016-08-24 08:00:04 | 2016-08-24 09:00:02 |
|   10 |         0.474 | 2016-08-24 09:00:02 | 2016-08-24 10:00:01 |
|   10 |         0.497 | 2016-08-24 10:00:01 | 2016-08-24 11:00:01 |
|   10 |         0.420 | 2016-08-24 11:00:01 | 2016-08-24 12:00:01 |
|   10 |         0.465 | 2016-08-24 12:00:01 | 2016-08-24 13:00:00 |
|   10 |         0.437 | 2016-08-24 13:00:00 | 2016-08-24 14:00:04 |
|   10 |         1.384 | 2016-08-24 14:00:04 | 2016-08-24 15:00:04 |
|   10 |         0.534 | 2016-08-24 15:00:04 | 2016-08-24 16:00:04 |
|   10 |         1.423 | 2016-08-24 16:00:04 | 2016-08-24 17:00:04 |
|   10 |         0.518 | 2016-08-24 17:00:04 | 2016-08-24 18:00:04 |
|   10 |         0.413 | 2016-08-24 18:00:04 | 2016-08-24 19:00:04 |
|   10 |         0.424 | 2016-08-24 19:00:04 | 2016-08-24 20:00:04 |
|   10 |         0.359 | 2016-08-24 20:00:04 | 2016-08-24 21:00:04 |
|   10 |         0.401 | 2016-08-24 21:00:04 | 2016-08-24 22:00:05 |
|   10 |         0.431 | 2016-08-24 22:00:05 | 2016-08-24 23:00:00 |
|   10 |         0.371 | 2016-08-24 23:00:00 | 2016-08-25 00:00:01 |
+------+---------------+---------------------+---------------------+
24 rows in set (0.00 sec)

總計為 12.131。不幸的是,我的聚合查詢僅對 23 個間隔求和,而不是 24 個。它錯過了 0.371 的值。我的查詢有什麼問題?

   mysql> SELECT t1.timestamp AS stop_interval, t2.register_min,  
       -> t2.register_max, (register_max - register_min) AS register_diff   
       -> FROM emoncms.custom_registers AS t1,   
       -> (SELECT timestamp, ROUND(MIN(register_value),3) AS register_min,   
       -> ROUND(MAX(register_value),3) AS register_max   
       -> FROM emoncms.custom_registers WHERE timestamp >= '2016-08-24'   
       -> AND timestamp <= ('2016-08-25'+ INTERVAL 1 MINUTE)   
       -> GROUP BY YEAR(timestamp), WEEK(timestamp), DAY(timestamp)  
       -> ) AS t2 WHERE t1.timestamp = t2.timestamp;

+---------------------+--------------+--------------+---------------+
| stop_interval       | register_min | register_max | register_diff |
+---------------------+--------------+--------------+---------------+
| 2016-08-24 00:00:03 |      869.458 |      881.218 |        11.760 |
| 2016-08-25 00:00:01 |      881.589 |      881.589 |         0.000 |
+---------------------+--------------+--------------+---------------+
2 rows in set (0.00 sec)

也許其中之一?

SELECT * FROM emoncms.custom_registers 
   WHERE timestamp >= '2016-08-24'
     AND timestamp  < '2016-08-24' + INTERVAL 25 HOUR;

SELECT * FROM emoncms.custom_registers 
   WHERE timestamp >= '2016-08-24'
   ORDER BY timestamp
   LIMIT 25;

注意使用 25,而不是 24。

您的查詢是無效的 sql,它似乎在 MySQL 中工作(除非您將 ONLY_FULL_GROUP_BY 添加到 @@sql_mode),但可能會產生不確定的結果。如果我理解您的意圖,您希望獲得每個節點的目前行和上一行。據我了解,視窗函式很快就會進入 MySQL,但還沒有。使用 LEAD 或 LAG 來解決您的問題很容易,但您可能無法使用這些:

select node, timestamp ts1
   , lead(register_value) over (partition by node order by timestamp) - register_value as diff 
   , lead(t1.timestamp) over (partition by node order by timestamp) ts2 
from custom_registers

如果沒有視窗函式,第一步是查找目前行和下一行:

select t1.node, t1.timestamp ts1, min(t2.timestamp) ts2 
from custom_registers t1 
left join custom_registers t2 
   on t1.node = t2.node 
  and t1.timestamp < t2.timestamp group by t1.node, t1.timestamp;

+------+---------------------+---------------------+
| node | ts1                 | ts2                 |
+------+---------------------+---------------------+
|   10 | 2016-08-24 00:00:03 | 2016-08-24 01:00:00 |
|   10 | 2016-08-24 01:00:00 | 2016-08-24 02:00:03 |
|   10 | 2016-08-24 02:00:03 | 2016-08-24 03:00:01 |
|   10 | 2016-08-24 03:00:01 | 2016-08-24 04:00:05 |
|   10 | 2016-08-24 04:00:05 | 2016-08-24 05:00:03 |
|   10 | 2016-08-24 05:00:03 | 2016-08-24 06:00:01 |
|   10 | 2016-08-24 06:00:01 | 2016-08-24 07:00:00 |
|   10 | 2016-08-24 07:00:00 | 2016-08-24 08:00:04 |
|   10 | 2016-08-24 08:00:04 | 2016-08-24 09:00:02 |
|   10 | 2016-08-24 09:00:02 | 2016-08-24 10:00:01 |
|   10 | 2016-08-24 10:00:01 | 2016-08-24 11:00:01 |
|   10 | 2016-08-24 11:00:01 | 2016-08-24 12:00:01 |
|   10 | 2016-08-24 12:00:01 | 2016-08-24 13:00:00 |
|   10 | 2016-08-24 13:00:00 | 2016-08-24 14:00:04 |
|   10 | 2016-08-24 14:00:04 | 2016-08-24 15:00:04 |
|   10 | 2016-08-24 15:00:04 | 2016-08-24 16:00:04 |
|   10 | 2016-08-24 16:00:04 | 2016-08-24 17:00:04 |
|   10 | 2016-08-24 17:00:04 | 2016-08-24 18:00:04 |
|   10 | 2016-08-24 18:00:04 | 2016-08-24 19:00:04 |
|   10 | 2016-08-24 19:00:04 | 2016-08-24 20:00:04 |
|   10 | 2016-08-24 20:00:04 | 2016-08-24 21:00:04 |
|   10 | 2016-08-24 21:00:04 | 2016-08-24 22:00:05 |
|   10 | 2016-08-24 22:00:05 | 2016-08-24 23:00:00 |
|   10 | 2016-08-24 23:00:00 | NULL                |
+------+---------------------+---------------------+

一組 24 行(0.01 秒)

注意最後一行的空值,表示沒有下一行。

現在我們可以使用這個派生表並將其與原始表連接兩次:

select x.node, abs(t3.register_value - t4.register_value) as diff
    , x.ts1, x.ts2 
from (
   select t1.node, t1.timestamp ts1, min(t2.timestamp) ts2 
   from custom_registers t1 
   left join custom_registers t2 
       on t1.node = t2.node 
      and t1.timestamp < t2.timestamp 
   group by t1.node, t1.timestamp
) as x 
join custom_registers t3 
   on x.node = t3.node and x.ts1 = t3.timestamp 
join custom_registers t4 
   on x.node = t4.node and x.ts2 = t4.timestamp 
order by x.node, x.ts1;

這將使用您的樣本數據生成 23 行,我不確定您想對最後一行做什麼,因為那裡沒有下一行。

+------+-------+---------------------+---------------------+
| node | diff  | ts1                 | ts2                 |
+------+-------+---------------------+---------------------+
|   10 | 0.367 | 2016-08-24 00:00:03 | 2016-08-24 01:00:00 |
|   10 | 0.377 | 2016-08-24 01:00:00 | 2016-08-24 02:00:03 |
|   10 | 0.414 | 2016-08-24 02:00:03 | 2016-08-24 03:00:01 |
|   10 | 0.361 | 2016-08-24 03:00:01 | 2016-08-24 04:00:05 |
|   10 | 0.399 | 2016-08-24 04:00:05 | 2016-08-24 05:00:03 |
|   10 | 0.383 | 2016-08-24 05:00:03 | 2016-08-24 06:00:01 |
|   10 | 0.397 | 2016-08-24 06:00:01 | 2016-08-24 07:00:00 |
|   10 | 0.473 | 2016-08-24 07:00:00 | 2016-08-24 08:00:04 |
|   10 | 0.409 | 2016-08-24 08:00:04 | 2016-08-24 09:00:02 |
|   10 | 0.474 | 2016-08-24 09:00:02 | 2016-08-24 10:00:01 |
|   10 | 0.497 | 2016-08-24 10:00:01 | 2016-08-24 11:00:01 |
|   10 | 0.420 | 2016-08-24 11:00:01 | 2016-08-24 12:00:01 |
|   10 | 0.465 | 2016-08-24 12:00:01 | 2016-08-24 13:00:00 |
|   10 | 0.437 | 2016-08-24 13:00:00 | 2016-08-24 14:00:04 |
|   10 | 1.384 | 2016-08-24 14:00:04 | 2016-08-24 15:00:04 |
|   10 | 0.534 | 2016-08-24 15:00:04 | 2016-08-24 16:00:04 |
|   10 | 1.423 | 2016-08-24 16:00:04 | 2016-08-24 17:00:04 |
|   10 | 0.518 | 2016-08-24 17:00:04 | 2016-08-24 18:00:04 |
|   10 | 0.413 | 2016-08-24 18:00:04 | 2016-08-24 19:00:04 |
|   10 | 0.424 | 2016-08-24 19:00:04 | 2016-08-24 20:00:04 |
|   10 | 0.359 | 2016-08-24 20:00:04 | 2016-08-24 21:00:04 |
|   10 | 0.401 | 2016-08-24 21:00:04 | 2016-08-24 22:00:05 |
|   10 | 0.431 | 2016-08-24 22:00:05 | 2016-08-24 23:00:00 |
+------+-------+---------------------+---------------------+
23 rows in set (0.01 sec)

如果您只對日期總數感興趣,例如:

select node, dt, min_val, max_val, max_val-min_val as diff 
from (
   select node, date(timestamp) dt, min(register_value) as min_val, max(register_value) as max_val 
   from custom_registers 
   group by node, date(timestamp)
) as t;
+------+------------+---------+---------+--------+
| node | dt         | min_val | max_val | diff   |
+------+------------+---------+---------+--------+
|   10 | 2016-08-24 | 869.458 | 881.218 | 11.760 |
+------+------------+---------+---------+--------+

編輯:我現在註意到您的樣本中實際上有一個第二天的值。補充一點:

insert into custom_registers values (323,10,'2016-08-25 00:00:01',881.588);

select x.node, abs(t3.register_value - t4.register_value) as diff, x.ts1, x.ts2 
from (select t1.node, t1.timestamp ts1, min(t2.timestamp) ts2 
     from custom_registers t1 
     left join custom_registers t2 
         on t1.node = t2.node 
        and t1.timestamp < t2.timestamp 
     group by t1.node, t1.timestamp
) as x 
join custom_registers t3 
   on x.node = t3.node 
  and x.ts1 = t3.timestamp 
join custom_registers t4 
   on x.node = t4.node 
  and x.ts2 = t4.timestamp 
where ts1 >= '2016-08-24 00:00:00' 
 and ts1 < '2016-08-25 00:00:00' 
order by x.node, x.ts1;

[...]

|   10 | 0.370 | 2016-08-24 23:00:00 | 2016-08-25 00:00:01 |
24 rows in set (0.01 sec)

日聚合變為:

select node, dt, min_val, max_val, max_val-min_val as diff 
from (select node, date(timestamp) dt, min(register_value) as min_val
          , max(register_value)  as max_val 
     from custom_registers 
     group by node, date(timestamp)
) as t;
+------+------------+---------+---------+--------+
| node | dt         | min_val | max_val | diff   |
+------+------------+---------+---------+--------+
|   10 | 2016-08-24 | 869.458 | 881.218 | 11.760 |
|   10 | 2016-08-25 | 881.588 | 881.588 | 0.000  |
+------+------------+---------+---------+--------+
2 rows in set (0.71 sec)

引用自:https://dba.stackexchange.com/questions/147893