Sql-Server

MDX Max、Average 和 StDev 函式未給出正確結果

  • March 14, 2015

我有一個場景,我有Date維度和Time維度以及其他一些維度。時間維度僅包含Hours因為不需要在分鐘或秒級別保存數據。

一種情況需要我計算一天中某些時間AverageMax價值。例如,我需要Max01:00 到 04:00 之間的值以及給定時間段(本例中為 2015 年 1 月 1 日至 2015 年 1 月 3 日)內這些最大值的平均值。

我將過濾DateTime維度(以及其他維度)作為Subselect From子句而不是使用Where.

如果Time維度成員對於每個選定的Date維度值保持相同,則Max,Avg並給出正確的結果,但如果我必須從維度成員StDev之一中排除幾個小時,則會導致錯誤,並給出不正確的值。Date``Max``Avg``StDev

請參考下面的 SQL 程式碼來創建我將在這個問題中使用的範例:

create database TestDateTimeDimensions
GO

USE [TestDateTimeDimensions]
GO

create table DimDate (
DateId int not null PRIMARY KEY,
Date [Date] not null
)


insert into DimDate
select 20150101, CONVERT(DATE, '2015-01-01')
UNION ALL
select 20150102, CONVERT(DATE, '2015-01-02')
UNION ALL
select 20150103, CONVERT(DATE, '2015-01-03')
UNION ALL
select 20150104, CONVERT(DATE, '2015-01-04')

create table DimTime(
TimeId int not null PRIMARY KEY,
[Time] Time not null
)

insert into DimTime
select 0, CONVERT(TIME, '00:00:00')
UNION ALL
select 1, CONVERT(TIME, '01:00:00')
UNION ALL
select 2, CONVERT(TIME, '02:00:00')
UNION ALL
select 3, CONVERT(TIME, '03:00:00')
UNION ALL
select 4, CONVERT(TIME, '04:00:00')
UNION ALL
select 5, CONVERT(TIME, '05:00:00')
UNION ALL
select 6, CONVERT(TIME, '06:00:00')
UNION ALL
select 7, CONVERT(TIME, '07:00:00')
UNION ALL
select 8, CONVERT(TIME, '08:00:00')
UNION ALL
select 9, CONVERT(TIME, '09:00:00')
UNION ALL
select 10, CONVERT(TIME, '10:00:00')
UNION ALL
select 11, CONVERT(TIME, '11:00:00')
UNION ALL
select 12, CONVERT(TIME, '12:00:00')
UNION ALL
select 13, CONVERT(TIME, '13:00:00')
UNION ALL
select 14, CONVERT(TIME, '14:00:00')
UNION ALL
select 15, CONVERT(TIME, '15:00:00')
UNION ALL
select 16, CONVERT(TIME, '16:00:00')
UNION ALL
select 17, CONVERT(TIME, '17:00:00')
UNION ALL
select 18, CONVERT(TIME, '18:00:00')
UNION ALL
select 19, CONVERT(TIME, '19:00:00')
UNION ALL
select 20, CONVERT(TIME, '20:00:00')
UNION ALL
select 21, CONVERT(TIME, '21:00:00')
UNION ALL
select 22, CONVERT(TIME, '22:00:00')
UNION ALL
select 23, CONVERT(TIME, '23:00:00')

CREATE TABLE Fact (
Id int identity not null PRIMARY KEY,
DateId int not null REFERENCES DimDate(DateId),
TimeId int not null REFERENCES DimTime(TimeId),
value int not null
)

insert into Fact

select DateId, TimeId, ROUND(RAND(CONVERT(varbinary, NEWID())) * 100, 2)
FROM DimDate, DimTime

正常工作的MDX查詢是:

WITH
MEMBER MaxMember as MAX(TimeSet, [Measures].[Value])
MEMBER AvgOfMax as Avg(DateSet, MAX(TimeSet, [Measures].[Value]))
MEMBER StDevOfMax as StDev(DateSet, MAX(TimeSet, [Measures].[Value]))

SET DateSet as EXISTING [Dim Date].[Date Id].[Date Id]
SET TimeSet as EXISTING [Dim Time].[Time Id].[Time Id]

select
{
   [Measures].[Value], MaxMember, AvgOfMax, StDevOfMax
}
on 0,
{
   DateSet * TimeSet
} on 1

FROM 
(
   SELECT ({[Dim Date].[Date Id].&[20150101] : [Dim Date].[Date Id].&[20150103]}, {[Dim Time].[Time Id].&[1] : [Dim Time].[Time Id].&[4]})
   on 0
   FROM [Test Date Time Dimensions]
)

它將結果正確顯示為:

正確結果

請注意,未使用 Axis 1 中包含的項目,實際查詢僅返回一行。它們僅用於此範例。

無法按預期工作的查詢是:

WITH
MEMBER MaxMember as MAX(TimeSet, [Measures].[Value])
MEMBER AvgOfMax as Avg(DateSet, MAX(TimeSet, [Measures].[Value]))
MEMBER StDevOfMax as StDev(DateSet, MAX(TimeSet, [Measures].[Value]))

SET DateSet as EXISTING [Dim Date].[Date Id].[Date Id]
SET TimeSet as EXISTING [Dim Time].[Time Id].[Time Id]

select
{
   [Measures].[Value], MaxMember, AvgOfMax, StDevOfMax
}
on 0

,
{
   DateSet * TimeSet
} on 1

FROM 
(
   SELECT {({[Dim Date].[Date Id].&[20150101] : [Dim Date].[Date Id].&[20150102]} * {[Dim Time].[Time Id].&[1] : [Dim Time].[Time Id].&[4]})
   , ([Dim Date].[Date Id].&[20150103] * [Dim Time].[Time Id].&[3] : [Dim Time].[Time Id].&[4])}
   on 0
   FROM [Test Date Time Dimensions]
)

在這裡,我排除了 Hours01:00和 ‘02:00 2015 年 1from Date月 3 日. The maximum value (70) is at hour01:00 of2015 年 1 月 3 日but is still being used to calculate Avg andStDev , andMax` 列給出錯誤,如圖所示:

錯誤

這裡的平均值和標準偏差都是不正確的。

請幫助我找到在第二種情況下正確計算結果的方法,並解釋為什麼 Max 給出錯誤並且Avg函式仍在計算,包括未包​​含在過濾子句StDev中的最大值。Subselect

我想出的一個不太理想的解決方法是添加一個同時具有and的新DateTime維度,並在子句中使用此維度,並利用現有維度來計算和維度來獲得平均和標準偏差,如下面的程式碼所示:Date``Time``DateTime``Subselect``Time``Max``Date

創建並填充DateTime維度:

create table DimDateTime
(
   DateTimeId int not null PRIMARY KEY,
   [Date] Date not null,
   [Time] Time not null 
)

insert into DimDateTime

select ((DateId * 100) + TimeId) DateTimeId, DimDate.Date, DimTime.Time
from DimDate, DimTime

alter table Fact add DateTimeId int References DimDateTime(DateTimeId)

update Fact set DateTimeId = ((DateId * 100) + TimeId)

MDX返回正確結果的查詢:

WITH
MEMBER MaxMember as MAX(TimeSet, [Measures].[Value])
MEMBER AvgOfMax as Avg(DateSet, MAX(TimeSet, [Measures].[Value]))
MEMBER StDevOfMax as StDev(DateSet, MAX(TimeSet, [Measures].[Value]))

SET DateSet as NONEMPTY ([Dim Date].[Date Id].[Date Id])
SET TimeSet as NONEMPTY ([Dim Time].[Time Id].[Time Id])

select
{
   [Measures].[Value], MaxMember, AvgOfMax, StDevOfMax
}
on 0,
{
   NONEMPTY(DateSet * TimeSet)
} on 1

FROM 
(
   SELECT ({[Dim Date Time].[Date Time Id].&[2015010101] : [Dim Date Time].[Date Time Id].&[2015010104],
           [Dim Date Time].[Date Time Id].&[2015010201] : [Dim Date Time].[Date Time Id].&[2015010204],
           [Dim Date Time].[Date Time Id].&[2015010303] : [Dim Date Time].[Date Time Id].&[2015010304]})
   on 0
   FROM [Test Date Time Dimensions]
)

但除了包含相同數據的現有兩個維度之外,我更願意避免創建新維度。其次,我想避免將數據輸入應用程序並在 API 方法中計算結果。因此,首選是使Max,AvgStDev功能正常工作。

使用 SQL Server 2012。

我嘗試在 MDX 中這樣做但無濟於事,因此我不得不使用 SSAS 使用者定義函式來計算 Max 的 Avg。跨計算集Avg( DateSet)、跨計算集Max( TimeSet) 和包含要在操作中排除的記錄的第三集Max作為參數傳遞給 UDF。

基本程式碼是這樣的(呈現 Max 功能的編碼 Avg,不包括錯誤檢查):

public static class AvgOfMaxMethods
   {
       public static double AvgOfMax(Set maxAcrossSet, Set avgAcrossSet, Set setToExclude, Expression measureExpression)
       {
           var setToExcludeQueryable = setToExclude.Tuples.OfType<Microsoft.AnalysisServices.AdomdServer.Tuple>();
           IList<double> maxMembers = new List<double>();
           foreach(var avgAcrossTuple in avgAcrossSet.Tuples)
           {
               var max = double.MinValue;
               foreach(var maxAcrossTuple in maxAcrossSet.Tuples)
               {
                   if (!setToExcludeQueryable.Any(tuple => tuple.Members[0].UniqueName.Equals(avgAcrossTuple.Members[0].UniqueName) && tuple.Members[1].UniqueName.Equals(maxAcrossTuple.Members[0].UniqueName)))
                   {
                       TupleBuilder tb = new TupleBuilder(avgAcrossTuple.Members[0]);
                       tb.Add(maxAcrossTuple.Members[0]);
                       var calculatedVal = measureExpression.Calculate(tb.ToTuple()).ToDouble();
                       max = calculatedVal > max ? calculatedVal : max;
                   }
               }
               if (!max.Equals(double.MinValue))
               {
                   maxMembers.Add(max);
               }
           }
           return maxMembers.Average();
       }
   }

有了這個,查詢就變成了(保留原始MaxStDev進行比較):

WITH
MEMBER MaxMember as MAX(TimeSet, [Measures].[Value])
MEMBER AvgOfMax as Avg(DateSet, MAX(TimeSet, [Measures].[Value]))
MEMBER StDevOfMax as StDev(DateSet, MAX(TimeSet, [Measures].[Value]))
Member MaxFromUDF as UDFPoc.UDFPoc.AvgOfMaxMethods.AvgOfMax(TimeSet, DateSet, SetToExclude, [Measures].[Value])

SET DateSet as EXISTING [Dim Date].[Date Id].[Date Id]
SET TimeSet as EXISTING [Dim Time].[Time Id].[Time Id]
SET SetToExclude as {[Dim Date].[Date Id].&[20150103] * [Dim Time].[Time Id].&[1] : [Dim Time].[Time Id].&[2]}

select
{
   [Measures].[Value], MaxMember, AvgOfMax, StDevOfMax, MaxFromUDF
}
on 0,
{
   DateSet * TimeSet
} on 1

FROM 
(
   SELECT ({[Dim Date].[Date Id].&[20150101] : [Dim Date].[Date Id].&[20150103]}, {[Dim Time].[Time Id].&[1] : [Dim Time].[Time Id].&[4]})
   on 0
   FROM [Test Date Time Dimensions]
)

結果看起來像(這裡排除集的元組以紅色突出顯示):

在此處輸入圖像描述

可以看出,UDFSettoExclude在計算Max和時排除了集合中每個元組的成員Average。現在正在計算考慮排除集的正確平均值,如圖所示。同樣StDev可以用這種方式計算。

將此作為創建 SSAS 使用者定義函式的指南。

引用自:https://dba.stackexchange.com/questions/94693