A New Events-In-Progress DAX Pattern
Reposted from Chris Webb's blog with the author's permission.
I've been working on a very complex SSAS Tabular implementation recently, and as a result I've learned a few new DAX tricks. The one that I'm going to blog about today takes me back to my old favourite, the events-in-progress problem. I've blogged about it a lot of times, looking at solutions for MDX and DAX (see here and here), and for this project I had to do some performance tuning on a measure that uses a filter very much like this.
Using the Adventure Works Tabular model, the obvious way of finding the number of Orders on the Internet Sales table that are open on any given date (ie where the Date is between the dates given in the Order Date and the Ship Date column) is to write a query something like this:
EVALUATE
ADDCOLUMNS (
VALUES ( 'Date'[Date] ),
"OpenOrders",
CALCULATE (
COUNTROWS ( 'Internet Sales' ),
FILTER( 'Internet Sales', 'Internet Sales'[Ship Date] > 'Date'[Date] ),
FILTER( 'Internet Sales', 'Internet Sales'[Order Date] <= 'Date'[Date] )
)
)
ORDER BY 'Date'[Date]
On my laptop this executes in around 1.9 seconds on a cold cache. However, after a bit of experimentation, I found the following query was substantially faster:
EVALUATE
ADDCOLUMNS (
VALUES ( 'Date'[Date] ),
"OpenOrders",
COUNTROWS(
FILTER(
'Internet Sales',
CONTAINS(
DATESBETWEEN('Date'[Date]
, 'Internet Sales'[Order Date]
, DATEADD('Internet Sales'[Ship Date],-1, DAY))
, [Date]
, 'Date'[Date]
)
)
)
)
ORDER BY 'Date'[Date]
On a cold cache this version executes in just 0.2 seconds on my laptop. What's different? In the first version of the calculation the FILTER() function is used to find the rows in Internet Sales where the Order Date is less than or equal to the Date on rows, and where the Ship Date is greater than the Date. This is the obvious way of solving the problem. In the new calculation the DATESBETWEEN() function is used to create a table of dates from the Order Date to the day before the Ship Date for each row on Internet Sales, and the CONTAINS() function is used to see if the Date we're interested in appears in that table.
I'll be honest and admit that I'm not sure why this version is so much faster, but if (as it seems) this is a generally applicable pattern then I think this is a very interesting discovery.
Thanks to Marco, Alberto and Marius for the discussion around this issue.
Chris has been working with Microsoft BI tools since he started using beta 3 of OLAP Services back in the late 90s. Since then he has worked with Analysis Services in a number of roles (including three years spent with Microsoft Consulting Services) and he is now an independent consultant specialising in complex MDX, Analysis Services cube design and Analysis Services query performance problems. His company website can be found at http://www.crossjoin.co.uk and his blog can be found at http://cwebbbi.wordpress.com/ . |
Tags: performance, dax, tabular