Sql-Server

如何控制非聚集列儲存索引上的分段最小/最大 data_id

  • June 3, 2020

給定一個沒有 PK 但具有基於行的聚集索引的簡單的基於行的表,如下所示:

create clustered index [CX_PropertyValue] ON [dbo].[PropertyValue] ([PropertyId], [Value])

然後我希望添加一個列儲存索引,該索引的分段順序與上面的聚集索引相同:

create nonclustered columnstore index CS_IX_PropertyValue on dbo.PropertyValue( 
   PropertyId, Value
)
with (drop_existing = on, maxdop = 1); -- maxdop=1 to preserve the order by property 

保留訂單的 MaxDop 提示來自:這裡

然後使用以下查詢報告 PropertyId 列的最小/最大 data_id,並報告 7 個段中的每一個段的完整範圍:

create view [Common].[ColumnStoreSegmentationView]
as
/*---------------------------------------------------------------------------------------------------------------------
   Purpose: List ColumnStore table segment min/max of columns.

    Source: https://joyfulcraftsmen.com/blog/cci-how-to-load-data-for-better-columnstore-segment-elimination/
            https://dba.stackexchange.com/a/268329/9415

   Modified    By            Description
   ----------  ----------    -----------------------------------------------------------------------------------------
   2020.06.02  crokusek/inet Initial Version 
 ---------------------------------------------------------------------------------------------------------------------*/
select --top 20000000000
      s.Name as SchemaName, 
      t.Name as TableName,
      i.Name as IndexName,
      c.name as ColumnName,
      c.column_id as ColumnId,
      cs.segment_id as SegmentId,
      cs.min_data_id as MinValue,
      cs.max_data_id as MaxValue
 from sys.schemas s
 join sys.tables t
   on t.schema_id = s.schema_id
 join sys.partitions as p  
   on p.object_id = t.object_id   
 join sys.indexes as I
   on i.object_id = p.object_id
  and i.index_id = p.index_id
 join sys.index_columns as ic
   on ic.[object_id] = I.[object_id]
  and ic.index_id = I.index_id   
 join sys.columns c
   on c.object_id = t.object_id
  and c.column_id = ic.column_id
 join sys.column_store_segments cs
   on cs.hobt_id = p.hobt_id
  and cs.column_id = ic.index_column_id 
--order by s.Name, t.Name, i.Name, c.Name, cs.Segment_Id
GO

我嘗試使聚集索引唯一,這確實稍微影響了報告的範圍,但仍然不是單調增加。

有任何想法嗎?

這是一個以這種方式完成分割的連結,但我看不出有任何區別。

版本:Microsoft SQL Server 2019 (RTM) - 15.0.2000.5 (X64)

非聚集列儲存索引不直接支持此功能。

它適用於聚集列儲存。

Azure Synapse Analytics 具有語言支持,可以一步完成,例如:

CREATE CLUSTERED COLUMNSTORE INDEX <index_name>
ON dbo.PropertyValue
ORDER (PropertyId, Value);

這種語法還沒有出現在 SQL Server 盒子產品中,儘管它在一個未記錄的特性標誌下可用,所以也許它並不遙遠。不過,它仍然不適用於非聚集列儲存索引。

一般解決方法

您可以做的最好的事情是用 和 創建非聚集行儲存索引,然後用非聚集列儲存索引MAXDOP = 1替換它和。MAXDOP = 1``DROP_EXISTING = ON

這不能保證按照您的意願保留順序,但很有可能:

CREATE NONCLUSTERED INDEX CS_IX_PropertyValue
ON dbo.PropertyValue (PropertyId, Value)
WITH (MAXDOP = 1);

CREATE NONCLUSTERED COLUMNSTORE INDEX CS_IX_PropertyValue
ON dbo.PropertyValue (PropertyId, Value)
WITH (DROP_EXISTING = ON, MAXDOP = 1);

這將為您提供在過濾時實現行組消除PropertyId的最佳機會。

特例

當所需的順序與行儲存聚集索引匹配時(問題中似乎就是這種情況),無需先創建行儲存非聚集索引。文件說:

請注意,對於非聚集列儲存索引 (NCCI),如果基本行儲存表具有聚集索引,則行已排序。在這種情況下,生成的非聚集列儲存索引將自動排序。

因此,在您的情況下,僅執行就足夠了:

CREATE NONCLUSTERED COLUMNSTORE INDEX CS_IX_PropertyValue
ON dbo.PropertyValue (PropertyId, Value)
WITH (MAXDOP = 1);

請參閱此db<>fiddle 展示

小提琴結果

元數據

您可以使用以下命令查看每個行組和列的最小值和最大值:

SELECT
   CSS.column_id,
   column_name = C.[name],
   rowgroup_id = CSS.segment_id,
   CSS.min_data_id,
   CSS.max_data_id,
   CSS.row_count
FROM sys.partitions AS P
JOIN sys.column_store_segments AS CSS
   ON CSS.hobt_id = P.hobt_id
JOIN sys.indexes AS I
   ON I.[object_id] = P.[object_id]
   AND I.index_id = P.index_id
JOIN sys.index_columns AS IC
   ON IC.[object_id] = I.[object_id]
   AND IC.index_id = I.index_id
   AND IC.index_column_id = CSS.column_id
JOIN sys.columns AS C
   ON C.[object_id] = P.[object_id]
   AND C.column_id = IC.column_id
WHERE
   P.[object_id] = OBJECT_ID(N'dbo.PropertyValue', N'U')
ORDER BY
   C.column_id,
   CSS.segment_id;

引用自:https://dba.stackexchange.com/questions/268328