The Curious Case of Joins while Cube Processing
Reposted from Jason Thomas blog with the author's permission.
One of those days, I was just chatting with a colleague of mine about some database designs and he was showing me some exquisite piece of work in the database. That is when I happened to glance upon a stored procedure which apparently seemed to materialize an already existing account dimension view as a table. This stroked my inherent sense of curiosity and I unleashed a barrage of questions at him regarding it’s relevance. My colleague started explaining that somehow the dimensions were getting joined with all the fact tables during cube processing and hence the processing takes a very long time to complete if the dimension view was used as such. To add some justice to his words, the dimension view indeed was doing some complex logic and it looked natural that the cube processing would take time if the view was used in the joins instead of a table. But what didn’t look natural to me was that the dimension views were being use din the joins. After all, I was under the assumption that it was just a ‘select from fact table’ that gets executed or at max, the fact views if some logic is used. So I decided to open this curious case of joins for investigation.
Even though it wasn't because I didnt trust my colleague, I just had to see with my own eyes to believe, the doubting Thomas that I am. So I asked him to process a particular fact table and I looked up the query that was being used.
Sure enough, the dimension view was being used in the joins. I was dumbfounded and decided to take a break to think. Adding tobacco to my pipe, I put on my thinking cap and stared aimlessly at the setting sun. Random thoughts started racking my brain and I even ventured far to think that it could be a bug in analysis services. Suddenly, something clicked in my mind and I started running towards my colleague and delivered in true Sherlock Holmes style - “Come, Watson, come! The game is afoot. Not a word! Into your clothes and come!”. My colleague also decided to humour me and tagged along.
Me : Chance has put in our way a most singular and whimsical problem, and it’s solution is it’s own reward.
Colleague : Can you please cut the crap and let me know why this happens?
Me : Watson, you know I can’t resist a touch of the dramatic. I have no desire to make mysteries, but it is impossible at the moment of action not to enter into long and complex explanations. Let’s go step by step and have a look at the query once again.
Colleague : Duh, ok!
Me : Perhaps when a man has special knowledge and special powers like my own, it rather encourages him to seek a complex explanation when a simpler one is at hand. The importance of careful observation can never be emphasized enough. If you would remember the way that you explained the problem to me, you mentioned that all the dimension views were getting joined in the fact table. But if you look at the query (especially the highlighted part), what do you observe?
Colleague : Hmmm, just the account dimension is getting joined.
Me : How often have I said to you that when you have eliminated the impossible, whatever remains, however improbable, must be the truth? We know now that all the dimensions are not getting joined. We also know that the account dimension is getting joined for some reason. What could be the reason?
Colleague : <light dawning on him> It could only mean that we have defined the usage of this dimension differently.
Me : Elementary my dear Watson! So let us open the dimension usage tab and check.
Colleague : Yes!!! There is a reference relationship between the Geography dimension and the fact table through Account, that should be the reason!
Me : To let the brain work without sufficient material is like racing an engine. It racks itself to pieces. Why should a reference relationship induce a join? After all, it could also be just resolved at run time like the regular relationship. To understand more, click to view the relationship details.
Colleague : Finally! It is because the Materialize option is ticked.
Me : Elementary, my dear Watson. When a reference relationship is materialized, the joining across dimensions is performed during processing and not in querying. Also, the attributes in the materialized reference dimensions follow the aggregation rules of standard dimensions. If you remove the tick in the materialize checkbox, you will notice that the joins would not be present anymore.
Colleague : OK, so this was it. So simple once we know the reason.
Me : There, Watson, this infernal case had haunted me for days. I hereby banish it completely from my presence.
Colleague : Could you please cut out the Sherlock Holmes part, it’s freaking me out!
So, here rests the curious case of joins while cube processing in my chronicles.
Note : Special mention for eQuotes for help with the Sherlock Holmes quotes. Also, to read on how the ticking of the Materialize option would affect your data, click on the blog below by Alberto Ferrari - SSAS: Reference materialized dimension might produce incorrect results
Jason has been working with Microsoft BI tools since he joined the IT industry in 2006. Since then he has worked across SSAS, SSRS and SSIS for a large number of clients. He is currently working for Mariner and is based out of Charlotte, NC. His personal blog can be found at http://road-blogs.blogspot.com |
Tags: design