Splitting Analysis Services 2005 cubes based on measure groups
January 29th, 2008 by Vidas MatelisWhen I initially migrated Analysis Services database from 2000 to 2005 version, in the new database I created single cube with all measure groups in it. At the time I believed that was a right way to do. This is quote from one of my favorite books “The Microsoft Data Warehouse Toolkit with SQL Server 2005 and the Microsoft Business Intelligence Toolset” page 322, chapter 7:
“The best practice in Analysis Services 2005 is to define a single cube for a database”… ” You are still permitted to create multiple cubes in a database, but you shouldn’t”. Instead, create a single cube with multiple measure groups.”
But later I found about Microsoft recommendation “Avoid having more than 15 measure groups in cube”. I posted question on Analysis Services MSDN forum and got reply from Greg Galloway about his experience – about 15% performance boost after splitting and about 10% performance improvement after reducing cubes MDX script code.
This post is about my experience in this area, as I finally spent some time splitting SSAS cubes with many measure groups into multiple cubes. This specific performance test was done on the cube that had 23 measure groups originally and was split into 11 cubes with 1-3 measure groups in each. Cube size was about 6GB (sample data set) and size sum of smaller cubes did not change much after restructuring. Original big cube had about 2600 lines of MDX script. After I split cube into smaller cubes, cube MDX script size went down to 350-1100 lines per cube. There is a big chunk of code related to time intelligence functionality that I had to repeat in each cube.
Here are notes from my tests:
- After splitting cubes, I saw consistently about 20-30% improvement on most queries from the larger measure groups.
- Smaller cubes had bigger performance improvement after they were split off from the big cube. On the smaller cubes I saw performance improved by as much as 100-300%.
- I used linked measure groups to access measures from different cube. That worked great and it is very easy to setup. For queries that use calculations based on linked measure groups I saw inconsistent performance behaviour. Some of the queries were 20-30% faster after the split, but few others where 10-20% slower.
- There is a known bug related to linked measure groups described in this KB article: http://support.microsoft.com/kb/940160/. I had problems kind of related to this bug. But in my case excluding cubes from the project and then adding them back in different order fixed this problem.
- After splitting cubes I enjoyed the fact that related MDX scripts got smaller and it was easier to find code associated with specific measure groups.
- One thing that I do not like about cube split was the fact that I have to repeat the same time intelligence related code in each cube. It is going to be more difficult to maintain that code from now on.
So, it was definitely worth doing this restructuring and if you have cubes with many measure groups, consider splitting them.
For performance testing I used beta tool MDX Studio(by Mosha Pasumansky) and found it to be very helpful.
Posted in SSAS | 17 Comments »
January 30th, 2008 at 4:19 am
Hi Vidas,
This is something I’ve wondered about before. Although I’ve never worked with a cube that has as many as 15 measure groups and consequently never seen any particular performance improvement from splitting a cube into multiple measure groups, what I have seen is that the overall amount of code in an MDX Script does have an impact on query performance. Like Greg I’ve stripped out code that wasn’t needed for a particular query and then seen that query run faster, but of course if all that code is for one measure group you can’t permanently get rid of it. The only tip to offer is to Keep It Simple, as always…
January 30th, 2008 at 4:34 pm
Vidas-
My main finding, which I elaborated on in my post to the forum, was that reducing the number of cube dimensions seemed to be the more important thing to focus on. Thus, if two measure groups don’t share many dimensions, then it may be a good idea to split them into multiple cubes.
Could you comment on that concept as it related to the cubes you were working with? Did you have lots of cube dimensions that were used only in one measure group? Or were they mostly shared across all measure groups?
January 30th, 2008 at 5:48 pm
In my original cube I had 38 dimensions. Each cube had about 15-25 dimensions. And there were a few dimensions for each measure group or set of measure groups that were unique just to them.
When I had all measure groups in one cube – there was no way for me to cut down MDX code script or get rid of dimensions. Everything was used in one way or another. And some of my calculations used measures from many measure groups or I needed option to build reports from many measure gruops. This was solved with linked measure groups.
February 12th, 2008 at 5:40 pm
I try to group measure groups in cubes based on dimension commonality. As far as the larger measure groups (20GB+) I tend to keep these in their own cube.
If you have a cube with 30 measure groups, and you need to add a new dimension to just one of those measure groups, you’ll need to do a full reprocess on those other 29 measure groups. Makes sense to have more cubes for reprocessing.
April 17th, 2008 at 3:27 am
Hi Vidas
I’ve been working last months on testing different physical ways of organizing measure groups of huge cubes.
I think that an important side effects of splitting measure groups in more cubes that share certain dimensionallity is the use of linked measure groups.
We’ve seen little query performance differences (<5%) between using linked and physical measure groups.
But I’ve found “LOTS” of processing issues when using linked measure groups. Following, the top ones:
–Data Caché is not updated correctly after incrementally processing underlying physical cube partitions and linked cube; and after cleaning data caché.
–When adding partitions (without processing them), executing processadd on a sibling partition SSAS automatically full processes first one because measure group is referenced by a linked measure group in another cube.
I think that splitting cubes must be thought in conjunction with side effect of using linked measure groups.
April 22nd, 2008 at 9:53 pm
Leandro,
You have a very good points in your comments. I have not seen Data Cache problems myself, but it is probably because I do not use incremental partition processing. In my design I choose to partition measure groups by date and always do full process on affected partitions. I have a big enough time slot for that. I also read that doing incremental cube processing makes cubes slower and you need to do from time to time full cube reprocessing to improve cube performance. Of course, depending on your business requirements you might not have luxury of doing full partition re-processing.
I also always process real measure groups and then I issue command to process linked measure groups (very fast).
And thanks for sharing your findings about these side effects. I’ll keep this in mind for other implementations!
July 2nd, 2008 at 12:16 pm
Hi. I stumbled on this article by accident but am compelled to comment. Breaking things into separate cubes is generally a very bad idea. The nominal performance gains aren’t worth it.
July 2nd, 2008 at 12:24 pm
Tom,
I made this post 6 months ago, and I am even more convince now that splitting one big cube with many measures into multiple cubes was good idea for our database design.
Why do you think it is a bad idea?
July 2nd, 2008 at 2:39 pm
Hi Vidas,
Because it’s counter-productive to the prime mission, that being the integration of multiple data sources into one.
Of course in the real world, if performance isn’t cutting it we are often required to implement workarounds. But at the very least, I’d see if hardware/enterprise ed. would address performance issues before creating silos.
(Thanks for all the many useful articles btw.)
October 3rd, 2008 at 8:11 am
What is the impact of creating seperate databases? I have cubes that originate from one very large database (from disparate sources) that span many different subject areas. There are only a handful of common dimensions that are linked from one of the cubes. Is there a benefit or penalty to deploying these cubes to there own databases or to one common database? From an administrative perspective it is helpful to keep these cubes in there own solutions. Thanks for your most informative blog…
October 3rd, 2008 at 7:39 pm
Tim,
Splitting or not splitting measure groups into separate cubes or even databases depends a lot on your business requirements. But I cannot imagine case when I would want to split my data warehouse into separate databases.
I split my cubes and I am very happy about this. Tom Chester posted here that this is not a good idea, and in many cases he is right. Having everything in a single cube gives user flexibility to cross analyze data from all measure groups. Just when you know that some measure groups just don’t mean anything together, then splitting makes sense.
I think that having separate cubes already gives me enough flexibility to deal (change, process) with each cube separately if I want to.
Pros for splitting into databases:
– You can split your project into sub-projects. But you could do same with separate cubes too.
– Backup/restore operations could be faster (known issue with SSAS 2005 databases > 20GB)
– …
Cons for splitting into databases:
– Slower performance when linking measures from different databases
– You’ll have to maintain extra copies of dimensions – could be big deal.
– If you have links between databases, you’ll have problems when same dimensions are not in sync.
– …
Bottom line – I would not consider splitting into separate databases. Separating cubes gives you enough flexibility to do development/maintenance.
October 7th, 2008 at 7:38 am
Thanks for the info, Vidas. I have a follow up question that I have not been able to find info on anywhere. Regarding linked objects, what is the effect of processing in the originating source and the consuming source? Do you need to process a linked object in the database/cube that is consuming the object? Or will processing in the originating source be sufficient? If you do, do you need to worry about wiping out aggregations and indexes? I’m speaking specifically about linked dimensions in an attempt to optimize processing times. I generally call a process default at the conclusion of processing tasks to reset aggregations and indexes; do I need to explicitly process add or process update linked dimensions. Thanks in advance.
December 16th, 2009 at 6:16 pm
I appreciate all Pros/Cons you have all help put together. It has been over a year since the last post has anyone changed their mind over the last year – To Split or Not to Split (a cube).
December 17th, 2009 at 4:14 pm
Russell,
From my current experience, I would try to avoid linked objects as much as I can. They are quite hard to maintain. So if there is anything that you would like to see in a single report, then you should build single cube. Also, my understanding is that there are some improvements in SSAS 2008 where too many measure groups in a single cube will have less impact than 2005 release.
So, my design would be – similar granularity measure groups should go to single cube and I would have to have good reason not to do so. Measure groups that never will be queried together can go into separate cubes. Business requirements here should be above everything, so if you need mg in a single place, then I would do just that and would less worry about performance.
June 17th, 2010 at 5:43 pm
Hello Vidas,
I am working a Cubes. There are lot of cubes present but out of those there 3 cubes where a fact table contains millions n millions of records. Currently, it is taking 1.5 hours to process all the cubes. I want to optimize it so that it can be run faster. Please let me know if there is any way out on this.
Thanks in Advance.
Aniket
March 30th, 2011 at 4:12 pm
“So if there is anything that you would like to see in a single report, then you should build single cube”.
Yes I do need a report like that, but… I have 9 Measure Groups with 21 measures each aprox. and each measure group have one partition per month from year 2007 … In this particular case do you still think that it will be better to have a single cube (for develpment sake I feel that we shouldn’t)? or multiple cubes and then one cube with all the linked measure group (like a virtual cube)
Thank you for your advice…
Regards
Adriana
March 30th, 2011 at 9:25 pm
Adriana,
I would put your 9 measure groups into a single cube, I don’t see any problems with that. Especially if most of the dimensions in all measure groups are the same.
Don’t worry about number of measures or partitions – this should not affect your design considerations.
Linked cube has lots of problems – for example maintaining it is a pain – changes are not propagated automatically from linked measure group to “master”.