Sql-Server

對於最近的行,在 DateTime 上使用 RANK 查詢優化問題

  • February 27, 2020

我的視圖有問題,該視圖旨在儲存同步數據,用於 iOS 應用程序,將數據來回同步到 Web 服務。XML 欄位用於儲存用於轉置為 PDF 文件的結構化數據和 iOS 對象。對該表的頻繁寫入和讀取。

該表有大約 60 萬條記錄,由於兩個大型 XML 欄位,數據使用率非常高。數據庫表如下所示:

CREATE TABLE [dbo].[SyncData](
   [Id] [int] IDENTITY(1,1) NOT NULL,
   [CustomerId] [int] NOT NULL,
   [CategoryId] [int] NOT NULL,
   [SyncXml] [xml] NOT NULL,
   [SyncDump] [xml] NULL,
   [DateCreated] [datetime] NOT NULL CONSTRAINT [DF_SyncData_DateCreated]  DEFAULT (getdate()),
   [DateUpdated] [datetime] NOT NULL CONSTRAINT [DF_SyncData_DateUpdated]  DEFAULT (getdate()),
   [DELETED_FLAG] [bit] NOT NULL CONSTRAINT [DF_SyncData_DELETED_FLAG]  DEFAULT ((0)),
CONSTRAINT [PK_SyncData] PRIMARY KEY CLUSTERED 
(
   [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

GO

CustomerId 和 CategoryId 列具有外鍵。

Web 服務返回按類別組織的特定客戶的最新同步數據:

CREATE VIEW [dbo].[vw_SyncDataRecent]
AS
   WITH Ranked AS (
       SELECT 
           Id, 
           CustomerId, 
           CategoryId, 
           ROW_NUMBER() OVER (
               PARTITION BY Id, CustomerId 
               ORDER BY DateCreated DESC) AS rn
       FROM SyncData AS cb
   )
   SELECT Id, CustomerId, CategoryId FROM Ranked 
   WHERE rn = 1
GO

該表已經有幾個索引,它們可能是剩餘的(該數據庫自 2004 年以來一直在生產,並且已經添加了許多功能),所以我不確定它們是否還需要:

CREATE NONCLUSTERED INDEX [IX_SyncData_CustomerId] ON [dbo].[SyncData]
(
   [CustomerId] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [IX_SyncData_CustomerId_CategoryId] ON [dbo].[SyncData]
(
   [CustomerId] ASC,
   [CategoryId] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [IX_SyncData_CustomerId_CategoryId_DELETED_FLAG] ON [dbo].[SyncData]
(
   [CustomerId] ASC,
   [CategoryId] ASC,
   [DELETED_FLAG] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

通常使用 CustomerId 上的 WHERE 子句簡單地呼叫視圖:

SELECT * FROM vw_SyncDataRecent
WHERE CustomerId = 600528

結果如下所示:

| Id    | CustomerId    | CategoryId    |
|-----  |------------   |------------   |
| 189   | 600528        | 45            |
| 188   | 600528        | 46            |
| 191   | 600528        | 48            |
| 192   | 600528        | 49            |
| 190   | 600528        | 53            |

這些操作的性能很糟糕,最糟糕的是它對數據庫伺服器產生了嚴重影響,始終以 100% 的速度執行(這個視圖查詢似乎是最糟糕的罪魁禍首之一)。有沒有更好更有效的方法來做到這一點。我也剛剛意識到我不需要任何被標記為邏輯刪除的行(DELETED_FLAG),這似乎被忽略了,但我想這是另一回事。

我最初的想法是視圖本身是主要問題,因為它對整個數據集進行排名。此外,沒有包含 DateCreated 列的索引,但我不確定這對排序有多大用處。

由於該表很大,並且在生產中積極使用,我不想對錶本身進行大的更改,但我很樂意更改視圖或將其切換為使用儲存過程。

我的直覺就是這樣做,並創建一個儲存過程:

CREATE PROCEDURE [dbo].[_SyncData_Recent]
(
@CustomerId int = null
)
AS
BEGIN
   WITH Ranked AS (
       SELECT 
           Id, 
           CustomerId, 
           CategoryId, 
           ROW_NUMBER() OVER (PARTITION BY CustomerId, CategoryId ORDER BY DateCreated DESC) AS rn
       FROM SyncData AS cb
       WHERE CustomerId = @CustomerId
       AND DELETED_FLAG = 0
   )
   SELECT Id, CustomerId, CategoryId FROM Ranked WHERE rn = 1 AND CustomerId = @CustomerId
END
GO

想法?

我最初的想法是視圖本身是主要問題,因為它對整個數據集進行排名。

就是這樣,視窗函式應用於整個數據集,然後由CustomerId和過濾rn=1有關更多資訊,請參閱我之前的答案。

在此處輸入圖像描述

僅當CustomerId數據到達過濾器運算符時才應用過濾器謂詞:

在此處輸入圖像描述

您可以OPTION(RECOMPILE)使用過濾器添加到選擇查詢中以CustomerId在執行時查看並在通過視窗函式傳遞數據之前進行過濾

SELECT * FROM vw_SyncDataRecent
WHERE CustomerId = 600528
OPTION(RECOMPILE);

在此處輸入圖像描述

我的其他答案中提到的其他方法不使用視圖,而是使用 tvf,不使用視窗函式進行過濾,儲存過程,……

TVF 範例

CREATE FUNCTION dbo.[Fnc_SyncDataRecent]
(
   @P1 INT
)  
RETURNS TABLE
WITH SCHEMABINDING AS
RETURN
      WITH Ranked AS  (
   SELECT 
           Id, 
           CustomerId, 
           CategoryId, 
           ROW_NUMBER() OVER (
               PARTITION BY Id, CustomerId 
               ORDER BY DateCreated DESC) AS rn
       FROM dbo.SyncData AS cb
   )
   SELECT Id, CustomerId, CategoryId 
   FROM Ranked 
   WHERE rn = 1 and CustomerId = @P1



SELECT * FROM dbo.[Fnc_SyncDataRecent](600528);

我還要添加這個索引:

CREATE INDEX IX_CustomerID3
ON dbo.SyncData([CustomerId])
INCLUDE(CategoryId,ID,DateCreated);

編輯 您的儲存過程

CREATE PROCEDURE [dbo].[_SyncData_Recent]
(
@CustomerId int = null
)
AS
BEGIN
   WITH Ranked AS (
       SELECT 
           Id, 
           CustomerId, 
           CategoryId, 
           ROW_NUMBER() OVER (PARTITION BY CustomerId, CategoryId ORDER BY DateCreated DESC) AS rn
       FROM SyncData AS cb
       WHERE CustomerId = @CustomerId
       AND DELETED_FLAG = 0
   )
   SELECT Id, CustomerId, CategoryId FROM Ranked WHERE rn = 1 AND CustomerId = @CustomerId
END
GO

EXEC [dbo].[_SyncData_Recent] 600528;

也可以應用更早的過濾,但我會添加這個索引:

CREATE INDEX IX_CustomerID_CategoryId_DateCreated_
ON dbo.SyncData([CustomerId],CategoryId,DateCreated DESC)
INCLUDE(DELETED_FLAG,ID)
WHERE DELETED_FLAG = 0;

刪除排序運算符。

在此處輸入圖像描述

最後一個過濾器WHERE rn = 1 AND CustomerId = @CustomerId可能只是WHERE rn = 1.

引用自:https://dba.stackexchange.com/questions/260706