Counting Consistent Customers in MDX
Reposted from Chris Webb's blog with the author's permission.
The post I wrote a few years ago on counting new and returning customers in MDX has proved to be one of the most popular here; it's certainly a topic I've returned to a number of times for DAX, and other people (most recently Gerhard Brueckl) seem equally interested in solving this problem. However I had a comment from Sohrab Niramwalla yesterday which raised an interesting question: how do you could the number of customers who have bought from you in every time period from the beginning of time until the current date?
The Customer dimension in Adventure Works isn't actually very good for illustrating this because customers very rarely buy more than once, but if you look at Countries then you can recreate the scenario. Consider the following query:
select
{[Measures].[Internet Sales Amount]} *
[Customer].[Country].[Country].members on 0,
[Date].[Date].[Date].members
on 1
from [Adventure Works]
From the screenshot you can see that on July 1st 2001 there were purchases in 4 out of 6 countries; of those 4 countries, only 2 (Australia and USA) had purchases on July 2nd; both of those had purchases on July 3rd; but by July 4th only USA had seen purchases on every day since the beginning of time.
How then is is possible to get this count of the number of countries that have seen purchases since the beginning of time? I can think of a few approaches. One would involve using recursion and strtoset/settostr, but I generally like to avoid recursion and strtoset because performance can be unpredictable and bad. Probably the best way is to think of the problem like this: if a country has had sales on every day since the beginning of time, then count of days that had sales since the beginning of time will be the same as the total count of days since the beginning of time. Therefore, you can write a query something like this:
with
–count the number of days since the beginning of time
member measures.daycount as
count(null:[Date].[Date].currentmember)
–count the number of non empty days for sales since the beginning of time
member measures.nonemptydaycount as
count(
nonempty(
null:[Date].[Date].currentmember
, [Measures].[Internet Sales Amount])
)
–count the number of countries that have nonemptydaycount = daycount
member measures.[Consistent Customers] as
count(
filter(
[Customer].[Country].[Country].members
, measures.nonemptydaycount = measures.daycount)
)
select measures.[Consistent Customers] on 0,
head([Date].[Date].[Date].members, 15) on 1
from [Adventure Works]
Mosha's technique for optimising count(filter()) calculations might also be useful here, although I have to admit I've not tested it to see if it does improve performance.
However, for those of you who are fans of obscure MDX, here's another solution:
with
member measures.[Consistent Customers] as
count(
–iterate over every date since the beginning of time
generate(
{null : {{[Date].[Date].currentmember} as currentdate}.item(0)}
,
{
–if the current date in the iteration is the first date
iif([Date].[Date].currentmember is [Date].[Date].[Date].members.item(0)
,
–then define the inline named set customerset as
–all the nonempty customers
intersect(
nonempty(
[Customer].[Country].[Country].members
, [Measures].[Internet Sales Amount]) as customerset
, {})
,
–else, redefine the set customerset as the intersection of
–customerset and the nonempty countries in the current time period
intersect(
intersect(
customerset
, nonempty(
[Customer].[Country].[Country].members
, [Measures].[Internet Sales Amount])
) as customerset
, {})
)
–note that the intersect function is used to ensure only an empty
–set is ever returned from these expressions
,
–if the current date in the iteration is the current time period
–ie we are at the final iteration
iif([Date].[Date].currentmember is currentdate.item(0)
–then return the contents of the named set customerset
, customerset
, {})
}
))
select {measures.[Consistent Customers]} on 0,
head(
[Date].[Date].[Date].members
, 15)
on 1
from [Adventure Works]
It uses the generate() function to loop over ever date from the beginning of time to the current time period, and then redefines an inline named set (called customerset) on every step of the iteration to find the non empty countries. It doesn't perform as well as the previous solution in this particular case, but if there were more countries that might change; a few tweaks to the code might also speed it up. I thought it was worth mentioning, though, for the novelty value.
Chris has been working with Microsoft BI tools since he started using beta 3 of OLAP Services back in the late 90s. Since then he has worked with Analysis Services in a number of roles (including three years spent with Microsoft Consulting Services) and he is now an independent consultant specialising in complex MDX, Analysis Services cube design and Analysis Services query performance problems. His company website can be found at http://www.crossjoin.co.uk and his blog can be found at http://cwebbbi.wordpress.com/ . |