SQL Server - 當性能至關重要時，從每個組中選擇最近的記錄

February 6, 2020

我執行一個 SQL Server 2016 數據庫，其中有一個包含 100+ 百萬行的下表：
StationId | ParameterId |       DateTime       | Value
   1     |      2      | 2020-02-04 15:00:000 |  5.20
   1     |      2      | 2020-02-04 14:00:000 |  5.20
   1     |      2      | 2020-02-04 13:00:000 |  5.20
   1     |      3      | 2020-02-04 15:00:000 |  2.81
   1     |      3      | 2020-02-04 14:00:000 |  2.81
   1     |      4      | 2020-02-04 15:00:000 |  5.23
   2     |      2      | 2020-02-04 15:00:000 |  3.70
   2     |      4      | 2020-02-04 15:00:000 |  12.20
   3     |      2      | 2020-02-04 15:00:000 |  1.10
這個表有一個StationId、ParameterId和DateTime的聚集索引，按這個順序，都是升序的。
我需要的是，對於每個唯一對 StationId - ParameterId，從 DateTime 列返回最新值：
StationId | ParameterId |       LastDate       
   1     |      2      | 2020-02-04 15:00:000 
   1     |      3      | 2020-02-04 15:00:000 
   1     |      4      | 2020-02-04 15:00:000 
   2     |      2      | 2020-02-04 15:00:000 
   2     |      4      | 2020-02-04 15:00:000 
   3     |      2      | 2020-02-04 15:00:000 
我現在正在做的是以下查詢，執行大約需要 90 到 120 秒：
   SELECT StationId, ParameterId, MAX(DateTime) AS LastDate
   FROM  MyTable WITH (NOLOCK)
   GROUP BY StationId, ParameterId
我還看到很多文章建議以下內容，這需要 10 多分鐘才能執行：
   SELECT StationId, ParameterId, DateTime AS LastDate
   FROM
   (
      SELECT StationId, ParameterId, DateTime
      ,ROW_NUMBER() OVER (PARTITION BY StationId,ParameterIdORDER BY DateTime DESC) as row_num
      FROM  MyTable WITH (NOLOCK)
   )
   WHERE row_num = 1
即使在最好的情況下（使用 GROUP BY 子句和 MAX 聚合函式），執行計劃也不會指示 Index Seek：
我想知道是否有更好的方法來執行此查詢（或建構索引）以獲得更好的執行時間。

如果您有足夠數量的 (StationID, ParameterID) 對，請嘗試如下查詢：
select StationID, ParameterID, m.DateTime LastDate 
from StationParameter sp
cross apply 
 (
    select top 1 DateTime 
    from MyTable 
    where StationID = sp.StationID
     and ParameterID = sp.ParameterID
    order by DateTime desc
 ) m
為了使 SQL Server 能夠執行查找，DateTime為每個 (StationID,ParameterID) 對尋找最新的。
只有在 (StationID, ParameterID, DateTime) 上有一個聚集索引，SQL Server 無法在不掃描索引的葉級別的情況下發現不同的 (StationID, ParameterID) 對，並且它可以在掃描時找到最大的 DateTime。
同樣在 100M+ 行時，此表作為聚集列儲存而不是 BTree 聚集索引可能會更好。

引用自：https://dba.stackexchange.com/questions/258897

SQL Server - 當性能至關重要時，從每個組中選擇最近的記錄

相關問答

where 子句中的 dateadd - 在這種特殊情況下如何避免完全掃描？

如何確定是否需要或需要索引

實際執行計劃中未使用索引視圖？

跨數據庫的索引使用情況

常量變數為空時執行計劃不同

索引多個有效日期列的最佳方法