MDX Performance Hits - v2
MDX Performance Hints
Here are some simple steps you can take to ensure that you get the best performance from your calculations in SQL Server Analysis Services. Some of these exploit new functions or features in SQL Server Analysis Services 2005 and 2008.
Use semi-additive measures instead of mimicking with Calculations
One can mimic capabilities of the different semi-additive measure functions with calculations in the cube. In fact, sometimes one must because they are supported in the Enterprise SKU. However, the semi-additive functions operate much faster; something like twice as fast as expressions in the MDX Script.
Use Unary Operators instead of mimicking with Calculations
Distributive unary operators (that is, one whose member order does not matter) are generally twice as fast as trying to mimic their capabilities with assignments. If you’re tempted to try to improve calculation performance by stripping out unary operators and replacing with calculations, don’t.
The exception are non-distributive unary operators (involving *, / or numeric values). In this case, one might be able to improve performance.
Reduce set computation size
When computing values based on sets, use the nonempty function to reduce set size. For example, when computing the average value of sales for each product across cities, this query takes 1:27:
with member measures.x as avg(existing
[Product].[Product].[Product].members*[Customer].[Customer Geography].[City].members
, [Measures].[Internet Sales Amount])select [Date].[Calendar].[Month].members on 0,
non empty [Product].[Subcategory].members on 1
from [Adventure Works] where measures.x
but adding the nomempty reduces it by two-thirds to about 30 seconds:
with member measures.x as avg(existing
nonempty( [Product].[Product].[Product].members*[Customer].[Customer Geography].[City].members,[Measures].[Internet Sales Amount])
, [Measures].[Internet Sales Amount])
select [Date].[Calendar].[Month].members on 0,
non empty [Product].[Subcategory].members on 1
from [Adventure Works] where measures.x
Avoid assigning values like 0, Null, “N/A”, “-“ to cells that would remain empty otherwise
The Analysis Services engine is very efficient eliminating empty rows. Adding calculations with non empty values replacing null values does not allow AS to eliminate these rows. For example, this query replaces null values with the dash and the non empty key word does not eliminate them:
with member measures.x as
iif( not isempty([Measures].[Internet Sales Amount]),[Measures].[Internet Sales Amount],"-")
select descendants([Date].[Calendar].[Calendar Year].&[2004] ) on 0,
non empty [Customer].[Customer Geography].[Customer].members on 1
from [Adventure Works]
where measures.x
Non empty operates on cell values and not on formatted values. So we can instead use the format string to replace null values with the same character while still eliminating empty rows and runs in roughly half the time:
with member measures.x as
[Measures].[Internet Sales Amount], FORMAT_STRING = "#.00;(#.00);#.00;-"
select descendants([Date].[Calendar].[Calendar Year].&[2004] ) on 0,
non empty [Customer].[Customer Geography].[Customer].members on 1
from [Adventure Works]
where measures.x
Note that these two queries are not equivalent – the second query eliminates completely empty rows. For more information on using the format_string calculation property, see http://msdn.microsoft.com/en-us/library/ms146084.aspx.
Costs of computing formatted values
In some circumstances, the cost of determining the format string for an expression outweighs the cost of the value itself. To determine if this applies to a slow running query, compare execution times with and without the value cell property; for example,
select [Measures].[Internet Average Sales Amount] on 0 from [Adventure Works] cell properties value
If the result is noticeable faster without the formatting, apply the formatting directly in the script as follows:
scope([Measures].[Internet Average Sales Amount]);
FORMAT_STRING(this) = "currency";
end scope;
And execute the query (with formatting applied) to determine the extent of any performance benefit.
Writing calculation expressions like “expr1 * expr2”
When writing expressions as products of two other expressions, place the sparser one on the left hand side.
Sparser means that there are fewer valuse over the space of the calculation. For example, when computing currency conversions, there is an exchange rate for every sales transaction. There is not a sale for every customer, product and day combination; therefore sales transactions data is sparser than exchange rate data.
Consider the two queries below that have the signature of a currency conversion calculation of applying the exchange rate at leaves of the date dimension in adventure works. The only difference is exchanging the order of the expressions in the product of the cell calculation. The results are the same but using the sparser internet sales amount first results in about a 10% savings – not much, but it adds up (and savings depends on relative sparsity between the two expressions and may performance benefits may vary).
Sparse First
with cell CALCULATION x for '({[Measures].[Internet Sales Amount]},leaves([Date]))'
as
[Measures].[Internet Sales Amount] *
([Measures].[Average Rate],[Destination Currency].[Destination Currency].&[EURO])
select
non empty [Date].[Calendar].members on 0,
non empty [Product].[Product Categories].members on 1
from [Adventure Works]
where ([Measures].[Internet Sales Amount], [Customer].[Customer Geography].[State-Province].&[BC]&[CA])
Dense First
with cell CALCULATION x for '({[Measures].[Internet Sales Amount]},leaves([Date]))'
as
([Measures].[Average Rate],[Destination Currency].[Destination Currency].&[EURO])*
[Measures].[Internet Sales Amount]
select
non empty [Date].[Calendar].members on 0,
non empty [Product].[Product Categories].members on 1
from [Adventure Works]
where ([Measures].[Internet Sales Amount], [Customer].[Customer Geography].[State-Province].&[BC]&[CA])
Measures are preferable to Member Properties
Instead of writing expressions like Sum(Customer.City.Members, Customer.Population.MemberValue), consider defining a separate measure group on the City table, with a sum measure on the Population column.
Rank is Expensive
Determining whether a member or tuple is in a set is best accomplished with intersect. The rank function does the additional operation of determining where in the set that object lies. If you don’t need it, don’t do it. For example, instead of this:
rank( [Customer].[Customer Geography].[Country].&[Australia],
<set expression> )>0
Do this:
intersect({[Customer].[Customer Geography].[Country].&[Australia]}, <set> ).count > 0
Replace (a-b)/a with 1-b/a, when possible
Eliminating the extra operator saves a few percent in most scenarios but could save substantially more if the values for b are much more sparsely populated than a over the space of the expression
Use your attributes!
In SQL Server Analysis Services 2000, the filter function was a common means of selecting members from a set whose member property satisfied some condition; for example male customers could be expressed as :
Filter(Customer.name.Name.members, Customer.name.currentmember.properties(“Gender”) = “Male”)
Don’t do this in SQL Server Analysis Services 2005. Instead, create an attribute hierarchy Customers.Gender and do this:
(Customers.Gender.Male, Measures.Sales)
The attribute hierarchy can be made not visible if they should not be seen by end-users and still be available in MDX expressions.
The average sales to male customers can be expressed as this:
Avg(Customer.Name.Name.members*Customers.Gender.Male, Measures.Sales)
Why is .properties so much slower? It’s because getting the value of a member forces each member to be retrieved so the property value can be retrieved and queried. Exists is much faster because internal storage structures can be exploited directly
Use Is Comparison Operator
When comparing members, compare the objects using the IS comparison operator. Don’t do this:
Iif( [Currency Code].currentmember.Name = “USA”], …)
Do this:
Iif( [Currency Code].currentmember IS [Currency Code].[USA], …)
Translating the members to strings takes time and doesn’t need to be done.
Use NonEmpty Function
The NonEmpty function (new in SQL Server Analysis Services 2005) is optimized for removing empty tuples. So instead of doing this to get the customers who bought an Xbox,
Filter(Customer.Name.members, not IsEmpty( ([Measures].[Unit Sales], [Product].[Name].[Xbox])
Do this:
NonEmpty (Customer.Name.members, ([Measures].[Unit Sales], [Product].[Name].[Xbox]))
Note that Filter(<set>, Not Isempty(<measure>)) is equivalent to NonEmpty(<set>,<measure>) and the engine does the optimization automatically.
Another example. Instead of this:
Count(Filter([year and month].[Month],
Not IsEmpty(([Project Count], [Layer].[Layer].[web]))
or Not IsEmpty(([Project Count], [Layer].[Layer].[server]))))
Do this:
Count(NonEmpty ([year and month].[Month],{([Project Count],
[Layer].[Layer].[web]), ([Project Count], [Layer].[Layer].[server])})
Calculation Non Empty Behavior
Whether or not an expression resolves to null is important for two major reasons. First, most client applications use the “non empty” key word in a query. If you can tell the engine that you know an expression will evaluate to null, then needn’t be computed and can be eliminated from the query results before the expression is evaluated..
Second, the calculation engine can use the knowledge of a calculation’s non empty behavior even when the “non empty” keyword is not used. If a cell’s expression evaluates to null, then it needn’t be computed during query evaluation.
Note that the current distinction between how the engine uses an expression’s non empty behavior is really an artifact of the engine’s design and we’re working to eliminate the distinction in a future release. For now, we make this distinction because one or both optimizations is made depending how the calculation property non_empty_behavior (NEB) is defined - we’ll call the first “Non Empty” and the second “Calculation Engine”
When an expression’s NEB is defined, the author is guaranteeing the result is null when the NEB is null and not null when NEB is not null. This information is used internally by the calculation engine to build the query plan.
What can/should be specified in Non_Empty_Behavior (“NEB”) clauses is first and foremost determined by correctness/semantics aspects, before taking into account any performance goals.
Calculation type NEB applies to |
NEB expressions recognized |
Calculation Engine SP2 support |
Non Empty SP2 support |
Example |
Calculated Measure |
Constant measure reference |
Yes |
Yes |
With Member Measures.DollarSales As Measures.Sales / Measures.ExchangeRate, NEB = Measures.Sales |
Calculated Measure |
{List of two or more constant measure references} |
No |
Yes |
With Member Measures.Profit As Measures.Sales – Measures.Cost, NEB = {Measures.Sales, Measures.Cost} |
Any (calculated member, script assignment, calculated cell) |
Constant tuple reference Constant single-measure set |
Yes |
No |
Scope [Measures].[store cost]; This = iif( [Measures].[Exchange Rate]>0, [Measures].[Store Cost]/[Measures].[Exchange Rate], null ); Non_Empty_Behavior(This) = [Measures].[Store Cost]; End Scope; |
Here’s a summary of representative, common cases for using NEB. Note that defining NEB is never necessary to achieve the correct calculation results but defining it incorrectly can result in incorrect results. It should only be used if correctness implications are well understood and performance is poor without it.
Measures.M1 + or – Measures.M2
- In general, MUST specify both measures in NEB, for correctness reasons. Using SP2 is required for this.
- In particular, if both measures belong to the same measure group, MAY specify just one of them in NEB (could have better performance).
Measures.M1 * Measures.M2
- In general, one cannot specify any correct NEB for this expression because the result is null when either M1 or M2 is null.
- However, if it is guaranteed that one of the measures is never null (e.g. a currency exchange rate), then the other measure may be used in NEB.
- In particular, if it guaranteed that, for any given cell, either both measures are null, or both are non-null (e.g. they belong to the same measure group), then one may specify both measures in NEB (SP2 required), or specify a single measure.
Measures.M1 / Measures.M2
- The first measure (the numerator, M1) must be used when defining NEB.
Be very careful. This property is generally abused by those that use it. If this property is defined incorrectly, query results can be incorrect.
Don’t use LookupCube
It’s expensive. For any non-trivial usage, create a single cube that has the values you need.
Conditional Calculations
The IIF function can play havoc with existing engine calculation optimizations – the cause of the problem is that the optimizations can’t be brought to bear without knowing the expression that applies to a cell when the cell is being evaluated. We’re looking at the resolving this in a future release but in the meantime there may be some steps you can take.
Conditional Calculations and Scoping
The IIF is sometimes used to scope a calculation on a set of members by relying on the condition; for example:
This =
IIF([account].[Account Type].currentmember IS [Account].[Account Type].[Flow],
<e1>, <e2>
)
Don’t do this. This can expressed differently with much better performance:
Scope ([Account].[Account Type].[Flow]);
This = <e1>;
End Scope;
Scope ([Account].[Account Type].[Account Type].members –
[Account].[Account Type].[Flow]);
This = <e2>;
End Scope;
Note: in SQL Server Analysis Server 2008, this recommendation no longer applies and the two expressions are equivalent.
Conditional Calculations and NULL
If you can, having one of the arguments as NULL can significantly improve performance:
IIF ( <condition>, <expression>, NULL)
If this can’t be done, oddly enough this expression can be much faster (but only in AS2005 SP2 and later) and is worthy of experimentation:
IIF ( <condition>, <expr1>, NULL) + IIF ( NOT <condition>, NULL, <exp2>)
(If <condition> is non-trivial – consider pulling it out as a calculated member.)
Conditional Calculations and Non Empty Behavior
If the expression has a simple non empty behavior (a subject matter discussed elsewhere in this document), then exploit it. For example, this expression flips the sign of an expression based on a condition:
IIF (<condition>, <exp>, -<exp>)
If <exp> has a non empty behavior, than do this:
<exp> * IIF(<condition>, 1, -1), non_empty_behavior = <NEB>
Single Branch Conditions
If you must use a condition and the calculation only applies when a condition evaluates to true, do not do something like this to have the cell retain its value:
This = IIF(
<condition>,
<assignment>,
CalculationPassValue(measures.currentmember,-1,relative)
);
But do this instead:
IF <condition> THEN this = <assignment>;
Use ValidMeasure to Reduce Calculation Space
In SQL Server Analysis Services 2005, each measure group has the property IgnoreUnrelatedDimensions that defines how dimensions unrelated the measure are treated – either the dimension is ignored, or moving off the default member results ion a null value for the measure.
When defining calculations, don’t rely on this behavior but use the validmeasure function. For example:
scope(leaves([Time]), [Currency].[Currency].members - [Currency].[Currency].[Currency].[USA]
Scope [Measures].[store Sales];
This = iif( isempty(validmeasure([Measures].[Exchange Rate])), null, [Measures].[Store Sales]/validmeasure([Measures].[Exchange Rate]));
End Scope;
Use Sum or Aggregate instead of Additions
Instead of doing this:
Create Member measures.x as
(Sales, Country.USA) + (Sales, Country.Canada) + (Sales, Country.Mexico)…
Do this:
Create Member measures.x as Sum({Country.USA, Country.Canada, Country.Mexico}, Sales)
Clump Hierarchies in Big Crossjoins
If you can, keep hierarchies from the same dimension together. Why? Because internally, the engine does an exists between adjacents sets in a crossjoin. If a hierarchy from a different dimension is inserted between two hierarchies from the same dimension, this exists function happens afterwards and not before. This can greatly expand the space and impact performance.
Caching Calculation Results
The formula engine has an internal cache to reuse calculation results. But to cache something, the result has to be an addressable cell (or tuple) in the cube space.
For example, I have an application that calculates the ratio of a value to the maximum that value is over a set and renders the result as a histogram. The first way I tried to do this was with a calculation something like this:
with
member measures.y as
measures.[unit sales]
/
max(customers.[name].[name].members, measures.[unit sales])
select
measures.y on 0,
customers.[name].[name].members on 1
from sales
This took about 15 seconds on my laptop. The calculation is inefficient in that the maximum value does not change over the set but the expression forces its reevaluation as many times as there are customers (and there are over 10,000 of them). This approach is much better:
with
member measures.x as
max(customers.[name].[name].members, measures.[unit sales])
member measures.y as
measures.[unit sales]
/
(measures.x,[Customers].[Customers].[All Customers] )
select
measures.y on 0,
customers.[name].[name].members on 1
from sales
This takes less than a second! Over an order of magnitude performance improvement! Fun!
Now here’s a quiz – why must the expression for measures.y contain the member [Customers].[Customers].[All Customers] in the denominator? In other words, why can’t I do this:
with
member measures.x as
max(customers.[name].[name].members, measures.[unit sales])
member measures.y as
measures.[unit sales]
/
(measures.x)
select
measures.y on 0,
customers.[name].[name].members on 1
from sales
It is because we want to reference a cached value. Without [Customers].[Customers].[All Customers] overwriting the customer dimension’s attributes, the cell changes with a reference to the customer’s name and the calculation goes back to being reevaluated in each cell.
Tags: mdx, performance, tip