MDX Max、Average 和 StDev 函式未給出正確結果
我有一個場景,我有
Date
維度和Time
維度以及其他一些維度。時間維度僅包含Hours
因為不需要在分鐘或秒級別保存數據。一種情況需要我計算一天中某些時間
Average
的Max
價值。例如,我需要Max
01:00 到 04:00 之間的值以及給定時間段(本例中為 2015 年 1 月 1 日至 2015 年 1 月 3 日)內這些最大值的平均值。我將過濾
Date
和Time
維度(以及其他維度)作為Subselect
From
子句而不是使用Where
.如果
Time
維度成員對於每個選定的Date
維度值保持相同,則Max
,Avg
並給出正確的結果,但如果我必須從維度成員StDev
之一中排除幾個小時,則會導致錯誤,並給出不正確的值。Date``Max``Avg``StDev
請參考下面的 SQL 程式碼來創建我將在這個問題中使用的範例:
create database TestDateTimeDimensions GO USE [TestDateTimeDimensions] GO create table DimDate ( DateId int not null PRIMARY KEY, Date [Date] not null ) insert into DimDate select 20150101, CONVERT(DATE, '2015-01-01') UNION ALL select 20150102, CONVERT(DATE, '2015-01-02') UNION ALL select 20150103, CONVERT(DATE, '2015-01-03') UNION ALL select 20150104, CONVERT(DATE, '2015-01-04') create table DimTime( TimeId int not null PRIMARY KEY, [Time] Time not null ) insert into DimTime select 0, CONVERT(TIME, '00:00:00') UNION ALL select 1, CONVERT(TIME, '01:00:00') UNION ALL select 2, CONVERT(TIME, '02:00:00') UNION ALL select 3, CONVERT(TIME, '03:00:00') UNION ALL select 4, CONVERT(TIME, '04:00:00') UNION ALL select 5, CONVERT(TIME, '05:00:00') UNION ALL select 6, CONVERT(TIME, '06:00:00') UNION ALL select 7, CONVERT(TIME, '07:00:00') UNION ALL select 8, CONVERT(TIME, '08:00:00') UNION ALL select 9, CONVERT(TIME, '09:00:00') UNION ALL select 10, CONVERT(TIME, '10:00:00') UNION ALL select 11, CONVERT(TIME, '11:00:00') UNION ALL select 12, CONVERT(TIME, '12:00:00') UNION ALL select 13, CONVERT(TIME, '13:00:00') UNION ALL select 14, CONVERT(TIME, '14:00:00') UNION ALL select 15, CONVERT(TIME, '15:00:00') UNION ALL select 16, CONVERT(TIME, '16:00:00') UNION ALL select 17, CONVERT(TIME, '17:00:00') UNION ALL select 18, CONVERT(TIME, '18:00:00') UNION ALL select 19, CONVERT(TIME, '19:00:00') UNION ALL select 20, CONVERT(TIME, '20:00:00') UNION ALL select 21, CONVERT(TIME, '21:00:00') UNION ALL select 22, CONVERT(TIME, '22:00:00') UNION ALL select 23, CONVERT(TIME, '23:00:00') CREATE TABLE Fact ( Id int identity not null PRIMARY KEY, DateId int not null REFERENCES DimDate(DateId), TimeId int not null REFERENCES DimTime(TimeId), value int not null ) insert into Fact select DateId, TimeId, ROUND(RAND(CONVERT(varbinary, NEWID())) * 100, 2) FROM DimDate, DimTime
正常工作的
MDX
查詢是:WITH MEMBER MaxMember as MAX(TimeSet, [Measures].[Value]) MEMBER AvgOfMax as Avg(DateSet, MAX(TimeSet, [Measures].[Value])) MEMBER StDevOfMax as StDev(DateSet, MAX(TimeSet, [Measures].[Value])) SET DateSet as EXISTING [Dim Date].[Date Id].[Date Id] SET TimeSet as EXISTING [Dim Time].[Time Id].[Time Id] select { [Measures].[Value], MaxMember, AvgOfMax, StDevOfMax } on 0, { DateSet * TimeSet } on 1 FROM ( SELECT ({[Dim Date].[Date Id].&[20150101] : [Dim Date].[Date Id].&[20150103]}, {[Dim Time].[Time Id].&[1] : [Dim Time].[Time Id].&[4]}) on 0 FROM [Test Date Time Dimensions] )
它將結果正確顯示為:
請注意,未使用 Axis 1 中包含的項目,實際查詢僅返回一行。它們僅用於此範例。
無法按預期工作的查詢是:
WITH MEMBER MaxMember as MAX(TimeSet, [Measures].[Value]) MEMBER AvgOfMax as Avg(DateSet, MAX(TimeSet, [Measures].[Value])) MEMBER StDevOfMax as StDev(DateSet, MAX(TimeSet, [Measures].[Value])) SET DateSet as EXISTING [Dim Date].[Date Id].[Date Id] SET TimeSet as EXISTING [Dim Time].[Time Id].[Time Id] select { [Measures].[Value], MaxMember, AvgOfMax, StDevOfMax } on 0 , { DateSet * TimeSet } on 1 FROM ( SELECT {({[Dim Date].[Date Id].&[20150101] : [Dim Date].[Date Id].&[20150102]} * {[Dim Time].[Time Id].&[1] : [Dim Time].[Time Id].&[4]}) , ([Dim Date].[Date Id].&[20150103] * [Dim Time].[Time Id].&[3] : [Dim Time].[Time Id].&[4])} on 0 FROM [Test Date Time Dimensions] )
在這裡,我排除了 Hours
01:00
和 ‘02:00 2015 年 1from Date
月 3 日. The maximum value (70) is at hour
01:00of
2015 年 1 月 3 日but is still being used to calculate Avg and
StDev, and
Max` 列給出錯誤,如圖所示:這裡的平均值和標準偏差都是不正確的。
請幫助我找到在第二種情況下正確計算結果的方法,並解釋為什麼 Max 給出錯誤並且
Avg
函式仍在計算,包括未包含在過濾子句StDev
中的最大值。Subselect
我想出的一個不太理想的解決方法是添加一個同時具有and的新
DateTime
維度,並在子句中使用此維度,並利用現有維度來計算和維度來獲得平均和標準偏差,如下面的程式碼所示:Date``Time``DateTime``Subselect``Time``Max``Date
創建並填充
DateTime
維度:create table DimDateTime ( DateTimeId int not null PRIMARY KEY, [Date] Date not null, [Time] Time not null ) insert into DimDateTime select ((DateId * 100) + TimeId) DateTimeId, DimDate.Date, DimTime.Time from DimDate, DimTime alter table Fact add DateTimeId int References DimDateTime(DateTimeId) update Fact set DateTimeId = ((DateId * 100) + TimeId)
MDX
返回正確結果的查詢:WITH MEMBER MaxMember as MAX(TimeSet, [Measures].[Value]) MEMBER AvgOfMax as Avg(DateSet, MAX(TimeSet, [Measures].[Value])) MEMBER StDevOfMax as StDev(DateSet, MAX(TimeSet, [Measures].[Value])) SET DateSet as NONEMPTY ([Dim Date].[Date Id].[Date Id]) SET TimeSet as NONEMPTY ([Dim Time].[Time Id].[Time Id]) select { [Measures].[Value], MaxMember, AvgOfMax, StDevOfMax } on 0, { NONEMPTY(DateSet * TimeSet) } on 1 FROM ( SELECT ({[Dim Date Time].[Date Time Id].&[2015010101] : [Dim Date Time].[Date Time Id].&[2015010104], [Dim Date Time].[Date Time Id].&[2015010201] : [Dim Date Time].[Date Time Id].&[2015010204], [Dim Date Time].[Date Time Id].&[2015010303] : [Dim Date Time].[Date Time Id].&[2015010304]}) on 0 FROM [Test Date Time Dimensions] )
但除了包含相同數據的現有兩個維度之外,我更願意避免創建新維度。其次,我想避免將數據輸入應用程序並在 API 方法中計算結果。因此,首選是使
Max
,Avg
和StDev
功能正常工作。使用 SQL Server 2012。
我嘗試在 MDX 中這樣做但無濟於事,因此我不得不使用 SSAS 使用者定義函式來計算 Max 的 Avg。跨計算集
Avg
(DateSet
)、跨計算集Max
(TimeSet
) 和包含要在操作中排除的記錄的第三集Max
作為參數傳遞給 UDF。基本程式碼是這樣的(呈現 Max 功能的編碼 Avg,不包括錯誤檢查):
public static class AvgOfMaxMethods { public static double AvgOfMax(Set maxAcrossSet, Set avgAcrossSet, Set setToExclude, Expression measureExpression) { var setToExcludeQueryable = setToExclude.Tuples.OfType<Microsoft.AnalysisServices.AdomdServer.Tuple>(); IList<double> maxMembers = new List<double>(); foreach(var avgAcrossTuple in avgAcrossSet.Tuples) { var max = double.MinValue; foreach(var maxAcrossTuple in maxAcrossSet.Tuples) { if (!setToExcludeQueryable.Any(tuple => tuple.Members[0].UniqueName.Equals(avgAcrossTuple.Members[0].UniqueName) && tuple.Members[1].UniqueName.Equals(maxAcrossTuple.Members[0].UniqueName))) { TupleBuilder tb = new TupleBuilder(avgAcrossTuple.Members[0]); tb.Add(maxAcrossTuple.Members[0]); var calculatedVal = measureExpression.Calculate(tb.ToTuple()).ToDouble(); max = calculatedVal > max ? calculatedVal : max; } } if (!max.Equals(double.MinValue)) { maxMembers.Add(max); } } return maxMembers.Average(); } }
有了這個,查詢就變成了(保留原始
Max
並StDev
進行比較):WITH MEMBER MaxMember as MAX(TimeSet, [Measures].[Value]) MEMBER AvgOfMax as Avg(DateSet, MAX(TimeSet, [Measures].[Value])) MEMBER StDevOfMax as StDev(DateSet, MAX(TimeSet, [Measures].[Value])) Member MaxFromUDF as UDFPoc.UDFPoc.AvgOfMaxMethods.AvgOfMax(TimeSet, DateSet, SetToExclude, [Measures].[Value]) SET DateSet as EXISTING [Dim Date].[Date Id].[Date Id] SET TimeSet as EXISTING [Dim Time].[Time Id].[Time Id] SET SetToExclude as {[Dim Date].[Date Id].&[20150103] * [Dim Time].[Time Id].&[1] : [Dim Time].[Time Id].&[2]} select { [Measures].[Value], MaxMember, AvgOfMax, StDevOfMax, MaxFromUDF } on 0, { DateSet * TimeSet } on 1 FROM ( SELECT ({[Dim Date].[Date Id].&[20150101] : [Dim Date].[Date Id].&[20150103]}, {[Dim Time].[Time Id].&[1] : [Dim Time].[Time Id].&[4]}) on 0 FROM [Test Date Time Dimensions] )
結果看起來像(這裡排除集的元組以紅色突出顯示):
可以看出,UDF
SettoExclude
在計算Max
和時排除了集合中每個元組的成員Average
。現在正在計算考慮排除集的正確平均值,如圖所示。同樣StDev
可以用這種方式計算。將此作為創建 SSAS 使用者定義函式的指南。