Now() and the Formula Cache
Reposted from Chris Webb's blog with the author's permission.
You know, I have the nagging feeling I’ve already blogged about this before… but I can’t find the post (I have been at this for over four years now) so it’s worth mentioning again…
Anyway, a common question asked on the MSDN Forums is how to get the current date and then use it in an MDX calculation – see this thread for example. Usually this is because users want calculations that show the current day’s sales, or something similar. One answer is of course to use the Now() function, but what people don’t often realise is how this can impact the ability of Analysis Services to cache the values returned by calculated members, and therefore reduce overall query performance.
To understand why, let’s look at some examples in Adventure Works. First of all create a calculated measure on the AW cube as follows:
CREATE MEMBER CURRENTCUBE.[MEASURES].[NOWDEMO] AS NOW();
Then run the following query a few times:
SELECT {[Measures].[NOWDEMO]} ON 0
FROM [Adventure Works]
As you’d expect, every time you run this query you see the current date and time – and every time you run it, you see a different value. But, you may be thinking, doesn’t Analysis Services cache the results returned by calculated members? Well, yes it does in most cases, but for non-deterministic functions (functions that could return a different result every time they’re called) like Now() no caching takes place, because otherwise the value returned from the cache might be different from the one the function actually returns.
The next problem is that if you create any other calculated members that depend directly or indirectly on the value returned by a non-deterministic function, their values can’t be cached either. One problem I see sometimes in my consultancy work is poor query performance resulting from SSAS being unable to use the formula cache, because a large number of calculations have a dependency on a single calculation that uses the Now() function. Here’s a greatly simplified example of two calculated members, the first of which finds the current year and the second which returns a sales value for the year eight years before the current year:
CREATE MEMBER CURRENTCUBE.[MEASURES].[Current Year]
AS YEAR(vba!format(now(),"MM/dd/yyyy")),
VISIBLE = 1;
CREATE MEMBER CURRENTCUBE.[MEASURES].[SHOW SALES FOR A YEAR]
AS ([Measures].[Internet Sales Amount],
STRTOMEMBER("[Date].[Calendar Year].&[" + CSTR([MEASURES].[Current Year] - 8) + "]", CONSTRAINED));
If you run a query that references the second calculated measure on a cold cache, such as:
SELECT [MEASURES].[SHOW SALES FOR A YEAR] ON 0
FROM [Adventure Works]
The first time you run it you’ll see SSAS going to disk as you’d expect; the second time you run it though you’ll see SSAS is able to use the Storage Engine cache but not the Formula Engine cache, as this Profiler trace shows:
The highlighted Get Data From Cache event shows data being retrieved from the measure group cache. This is ok and can be beneficial for query performance, but if the calculation we’re doing is very expensive then it can still mean our query takes a long time to run. We’ll only get an instant response on a warm cache if we can work out how to use the formula cache somehow.
Luckily, in most cases where Now() is used, we don’t usually want the system date and time, we just want the date. That means that we only want to return a different value when the date changes, once every 24 hours. What we can do therefore is use a named set to somehow store the value returned by Now(), for example like this rewrite of the calculation above:
CREATE MEMBER CURRENTCUBE.[MEASURES].[Current Year]
AS YEAR(vba!format(now(),"MM/dd/yyyy")),
VISIBLE = 1;
CREATE SET CURRENTCUBE.MYYEAR AS
{STRTOMEMBER("[Date].[Calendar Year].&[" + CSTR([MEASURES].[Current Year] - 8) + "]", CONSTRAINED)};
CREATE MEMBER CURRENTCUBE.[MEASURES].[SHOW SALES FOR A YEAR - CACHEABLE]
AS ([Measures].[Internet Sales Amount],
MYYEAR.ITEM(0));
What we’re doing here is finding the year we want, then finding the member for the year 8 years ago and storing that in a named set. Because a static named set is only evaluated once, when we do our final calculation we can reference the single member stored in the named set and therefore make use of the formula cache as the following Profiler trace shows:
We’re now getting data from the flat cache, which is one part of the formula cache (which isn’t ideal either as it indicates the calculation is being evaluated in cell-by-cell mode, I guess because we’re referencing a named set inside it) and so warm-cache performance will be better.
The next problem is that when the date does change, we need to clear the cache. This can be accomplished easily by running an XMLA ClearCache command, perhaps direct from SQL Server agent, every night at midnight or whenever necessary.
To be honest, though, I’m not sure using the Now() function at all is a good thing – apart from the issues described here there are a lot of other risks involved, such as the time or date on your server being wrong or confusions with time zones and date formats. I think a better approach to the problem is to have an extra attribute in your Time dimension which flags up a date as ‘today’, and which changes every day. Of course this means you need to do some extra ETL and processing on your Time dimension as a result, but I think it’s a much cleaner solution than Now() and leads to much more efficient MDX calculations.
Chris has been working with Microsoft BI tools since he started using beta 3 of OLAP Services back in the late 90s. Since then he has worked with Analysis Services in a number of roles (including three years spent with Microsoft Consulting Services) and he is now an independent consultant specialising in complex MDX, Analysis Services cube design and Analysis Services query performance problems. His company website can be found at http://www.crossjoin.co.uk and his blog can be found at http://cwebbbi.spaces.live.com/ . |
Tags: mdx, performance