Sql-Server

使用空間索引是否正確?

  • October 4, 2021

我最近在一張表上使用了空間索引來改進按給定日期範圍、參考標識符以及它們是否在多邊形內過濾記錄的查詢。

這是查詢的範例

DECLARE
   @start DATETIME = '2021-09-01 00:00:00.000',
   @end DATETIME = '2021-09-22 00:00:00.000',
   @shape GEOGRAPHY
   
SET @shape = geography::STPolyFromText('POLYGON((
-5.0537109 55.5783447,
-3.9990234 51.0137547,
3.2299805 50.4295179,
4.0209961 54.8133484,
-5.0537109 55.5783447
))', 4326);

SELECT
   *
FROM [dbo].[getUsersByArea](@start, @end, @shape);

我注意到了

  • 有時甚至不使用空間索引。
  • 如果它出現在查詢計劃中,性能仍然很差。

user_location表有大約 7000 萬條記錄。

reference表有大約 100k 條記錄,通常有幾千 user_location條記錄引用一個表。

user表有 100k 條記錄。

我還擺弄了 OGC 地理方法,我嘗試使用STIntersects, STWithin, STContains(令人驚訝的行為相同STWithin),Filter但它無法緩解提到的任何問題(STIntersects在查詢計劃中不知何故顯得更糟)。

使用這種類型的索引是正確的方法嗎?這真是令人頭疼。

告訴我你的想法。

提前致謝。

這是架構定義

CREATE TABLE [dbo].[user_location] (
   [Id] [uniqueidentifier] ROWGUIDCOL  NOT NULL,
   [ReferenceId] [uniqueidentifier] NOT NULL,
   [RecordedOn] [datetime] NOT NULL,
   [Lat] [float] NULL,
   [Lng] [float] NULL,
   [Location] AS (case when [Lat] IS NULL OR [Lng] IS NULL then NULL else [geography]::Point([Lat],[Lng],(4326)) end) PERSISTED,
   CONSTRAINT [PK_dbo_user_location] PRIMARY KEY CLUSTERED([Id] ASC)
)

CREATE TABLE [dbo].[reference]
(
   [Id] [uniqueidentifier] NOT NULL,
   [UserId] [uniqueidentifier] NOT NULL,
   CONSTRAINT [PK_dbo_reference] PRIMARY KEY CLUSTERED([Id] ASC)
)

CREATE TABLE [dbo].[user]
(
   [Id] [uniqueidentifier] NOT NULL,
   [FullName] [nvarchar](100) NULL,
   [EmailAddress] [nvarchar](255) NULL,
   CONSTRAINT [PK_dbo_user] PRIMARY KEY CLUSTERED([Id] ASC)
) 

CREATE NONCLUSTERED INDEX [IX_reference_userId] ON [dbo].[reference]
(
   [UserId] ASC
) WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF)

CREATE NONCLUSTERED INDEX [IX_user_location_referenceId_recordedOn] ON [dbo].[user_location]
(
   [ReferenceId] ASC,
   [RecordedOn] ASC,
) WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF)

CREATE SPATIAL INDEX [SPIX_user_location] ON [dbo].[user_location]
(
   [Location]
)USING  GEOGRAPHY_AUTO_GRID 
WITH (
CELLS_PER_OBJECT = 12, STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF)

CREATE FUNCTION [dbo].[getUsersByArea]
(
   @start DATETIME,
   @end DATETIME,
   @shape GEOGRAPHY
)
RETURNS TABLE
AS
RETURN
(
   SELECT DISTINCT
       us.Id,
       us.FullName,
       us.EmailAddress
   FROM [dbo].[user_location] ul
   JOIN [dbo].[reference] rf
       ON ul.ReferenceId = rf.Id
   JOIN [dbo].[user] us
       ON us.Id = rf.UserId
   WHERE 
       ul.RecordedOn < @end
       AND ul.RecordedOn >= @start 
       AND ul.[Location] IS NOT NULL 
       AND @shape.STContains(ul.[Location]) = 1
)

還有一件有趣的事情,當您嘗試檢索空間索引元數據時,sys.sp_help_spatial_geography_index它會無限執行。

EXEC sys.sp_help_spatial_geography_index
   @tabname = 'user_location',
   @indexname = 'SPIX_user_location',
   @verboseoutput = 1,
   @query_sample = 'POINT (0 0)'

我認為您應該嘗試將@shape 變數儲存在另一個基表(臨時表?)中並為其添加空間索引

例子

CREATE PROCEDURE [dbo].[getUsersByArea]
   @start DATETIME,
   @end DATETIME,
   @shape GEOGRAPHY
AS
BEGIN
   SET NOCOUNT ON;

   CREATE TABLE #Shapes
   (
       [Id] [uniqueidentifier] NOT NULL,
       Shape GEOGRAPHY NOT NULL,
       CONSTRAINT [PK_Shapes] PRIMARY KEY CLUSTERED([Id] ASC)
   )

   INSERT INTO #Shapes
   VALUES
   (NEWID(), @shape)

   CREATE SPATIAL INDEX [SPIX_TMP_Shape] ON #Shapes ([Shape])
   -- YMMV
   USING GEOGRAPHY_GRID 
   WITH (GRIDS =(LEVEL_1 = HIGH,LEVEL_2 = HIGH,LEVEL_3 = HIGH,LEVEL_4= 
   HIGH),
   CELLS_PER_OBJECT = 16)

   SELECT DISTINCT
       us.Id,
       us.FullName,
       us.EmailAddress
   FROM [dbo].[user_location] ul
   JOIN [dbo].[reference] rf
       ON ul.ReferenceId = rf.Id
   JOIN [dbo].[user] us
       ON us.Id = rf.UserId
   JOIN #Shapes sh
       ON sh.Shape.STContains(ul.[Location]) = 1
   WHERE ul.[Location] IS NOT NULL
       ul.RecordedOn < @end
       AND ul.RecordedOn >= @start
)

引用自:https://dba.stackexchange.com/questions/299984