SQL Server 2008R2 Self Service BI – Gemini – how it performs with bigger tables
August 18th, 2009 by Vidas MatelisGemini was released about a week ago and I was playing with it almost every evening. I am still trying to understand what it is, how it works, what its limitations are, how to use DAX, etc. And I can say that I like what I see so far. Kasper de Jonge and Chris Webb already posted their initial Gemini reviews. In this post I will share my experience with Gemini so far.
First of all I want to point that for Gemini tests all you need is Excel 2010 and the Gemini Add-in file (it is just about 30MB). You do not need SQL Server 2008R2. I am pointing this out because as I was setting up my testing environment I installed SQL Server 2008R2 and later realized that it is not required. It is quite amazing that this 30MB add in file contains such powerfull software – the Gemini front-end, and the in memory Analysis Services server. Of course for Enterprise level installation you will need SQL Server 2008R2 and Sharepoint 2010, but such setup is not part of my current tests.
After my initial tests with a few thousand records I finally decided to do tests on bigger tables. I took the AdventureWorksDW database and expanded the FactInternetSales table to 22mln records by simply duplicating the included records. This table size grew to 3.2GB. My goal was to see how well Gemini can handle millions of rows. I ran my tests on my Laptop with 3GB of RAM and an Intel Duo CPU 2.5GHz processor. On the same laptop I have my SQL Server 2008 database with my expanded data, Excel 2010 and the Gemini add-in. I limited SQL Server memory to 500MB.
My first tests to load 22mln records into the Gemini add-in failed. After loading over 21mln records I got an error message:
Memory error: Allocation failure : Not enough storage is available to process this command. .
Server: The operation has been cancelled.
Server: The operation has been cancelled.
I tried to stop some services, close other applications and then repeated the same table load into Gemini, but always at the very end I was getting the same error message. Finally I started to reduce number of records in my FactInternetSales table and that helped. I found that on my laptop I was able to load just about 20mln rows. Considering that this laptop also runs SQL Server, I can live with this limitation.
It took me about 5min to load into Gemini 20mln rows and as I repeated my FactInternetSales table load multiple times, I saw consistent Gemini data load speed – about 4mln records per minute. Of course, load type depends on how many fields there are in the table, machine power, etc. If your SQL Server will be on the network, your load speed will depend on the network’s speed too. But this gives you an idea – when working with Gemini and bigger data sets it will take time to get data into Gemini. This actually comes as no surprise, just I am impressed with this speed. During the load I was checking the task manager and noticed that Excel was using 80-90% CPU time and SQL Server was using 10-20% CPU time. Total CPU usage during the data load into Gemini was almost always 100%.
Data loading was most time consuming operation. As soon as I had 20mln rows in Gemini I wanted to see if sorting and filtering operations really works as fast as I saw in demos before. I always doubted that Gemini can work with millions of rows that fast. But I was not disappointed – filtering on any field or sorting worked in sub-seconds. I tried to do multiple filters, sorting, filters again but Gemini responded so fast that I always had to track the displayed record count in the left bottom corner just to confirm that the operation was executed. I was very impressed. This will please many Gemini users and I expect to hear many wows during demos.
Then I loaded a few dimensions and manually added relationships to the fact table. After that I created a pivot table and started to browse data. Again, I was blown away by Gemini’s speed – millions of records on my laptop and a response to any query comes back in milliseconds. How do they do that? I still cannot believe I am getting this power from the 30MB add-in.
After that I hit “Save” button. It took about 40 seconds for Excel to save this new excel file. This file contains all of the data (!) and metadata from all tables and pivot definitions. The saved Excel file size was 72MB. Just to remind you – I loaded over 3.2GB SQL Server table into Gemini that compressed and saved that data in the 72MB Excel file. I probably got such a great compression because my table contained a lot of duplicate data. I noticed that saving Excel file after any changes with such amount of data always was on a slow side – it took about 20-40 seconds.
Opening the Excel file took just a few seconds and pivot table took few more seconds to return results from a changed query. Opening the Gemini add-in with 20mln rows saved took just a few seconds too.
Overall I am very impressed with Gemini and the way it handled my fact table with 20mln rows very well on my laptop.
Posted in Gemini | 18 Comments »
August 19th, 2009 at 3:50 am
Great test, i was thinking about doing the same thanks for saving me time :)
Pretty impressive those numbers on your client, I wonder how much faster (or slower) it will be when published to the Gemini server, and whether the servers does al the computing then.
August 19th, 2009 at 7:05 am
Hi Kasper,
My understanding is that when you deploy to SharePoint server, then your in memory SSAS database is deployed onto one of the assigned SSAS servers and then any Excel services (!) requests are handled by that SSAS server. So performance there will depend on how many such SSAS databases you have on that server and how actively they are used. I am not sure myself what will happen when such deployed to Sharepoint Excel workbook is downloaded and opened again on the client – will pivot work from remote SSAS server, or from local one. My best guess would be local server, but tests will show…
August 19th, 2009 at 9:25 am
How much memory was Excel using to hold that Gemini database in memory? (What about after you close all workbooks?) Wondering if it’s going to be on the order of the size of the xlsx on disk, or if there will be lots of other structures in memory.
August 19th, 2009 at 4:42 pm
>>I took the AdventureWorksDW database and >>expanded the FactInternetSales table to 22mln >>records by simply duplicating the included >>records
From what I know of COP databases, I think they only store each distinct value in a column – which means it would be interesting to generate new, random data rather than duplicate existing records , and then see what performance was like.
August 19th, 2009 at 4:54 pm
Greg, Excel was using about 550MB memory – that is more than just 72MB file size. Here are my memory usage notes:
– During the 20mln record load all 3GB of available laptop memory was used.
– After restarting laptop OS and other running background applications were using 1.23GB.
– I started Excel 2010 and laptop memory usage grew to 1.30MB
– I opened Excel workbook with my data and laptop memory usage grew to 1.55MB
– Playing with pivot and Gemini add-in increase memory usage to 1.75MB.
August 19th, 2009 at 6:16 pm
Chris,
I already “randomised” data and you were right, I am getting different results. I’ll post more info later.
August 19th, 2009 at 9:40 pm
[…] SQL Server 2008R2 Self Service BI – Gemini – how it performs with bigger tables […]
August 20th, 2009 at 11:14 pm
Are you using 32bit version of Excel or 64? The out of memory might have been due to using x86 version of Excel instead of running out of physical memory…
regards
ash
August 21st, 2009 at 12:53 am
[…] This post was Twitted by donalddotfarmer […]
August 21st, 2009 at 6:49 am
Ashvini,
My laptop is on 32bit Vista and Excel is also 32bit version. So in such setup Excel could use up to 2GB of RAM, is that right? Would there be a different error message when Excel runs out of physical memory, instead of machine running out of memory?
August 21st, 2009 at 7:15 pm
On a 32bit box the max Virtual Address Space an application can use is usually 2GB indeed (unless some /3GB boot options are used).
Gemini doesnt use AWE, so virtual memory will be a hard limit (comming from Windows OS) in this case – even if physical memory will be 8GB let’s say, still only 2(3)GB can be used by the app.
I’d recommand a 64bit box with 64-bit Excel and Addin for playing with lots of data.
Regards,
Adrian
August 22nd, 2009 at 1:00 pm
[…] velocidad de carga y niveles de comprensión con Gemini: http://www.ssas-info.com/VidasMatelisBlog/177_sql-server-2008r2-self-service-bi-gemini-how-it-perfor… y […]
August 22nd, 2009 at 1:00 pm
[…] velocidad de carga y niveles de comprensión con Gemini: http://www.ssas-info.com/VidasMatelisBlog/177_sql-server-2008r2-self-service-bi-gemini-how-it-perfor… y […]
September 4th, 2009 at 10:59 am
What’s the role of SSAS engine? Is it the end of the cube? I don’t understand how make hierarchies? Gemini is based only on a relational database?
September 4th, 2009 at 12:02 pm
Partice,
Gemini code is based on SSAS engine in memory storage mode. Gemini builds a cube in memory from the existing data. There is no hierarchies and no attribute relationship. You can cross-join any attributes and proper relationship based on the values will be calculated. You will load into Gemini mostly relational data and then based on that data you will build Pivot for a quick analysis.
September 4th, 2009 at 10:45 pm
Speaking from SSAS experience – calculations tend to slow down the cube query performance.
Have you tried the new DAX stuff to see how it performs – both at createion time, and then pivot table browsing time?
How about creating ‘layered calculations’ – like the solve order stuff we used to do?
September 8th, 2009 at 6:10 pm
Harsh,
There are 2 places where you define calculations – one in Gemini window, and another one in the Pivot. In the Gemini window all calculations are fully materialized. I tried to add simple calculations (like joining strings) and a bit more complicates ones (like find related field in another table) to Gemini window and it takes 5-15 seconds for them to come back. After that I did not see any slowdowns. Simple calculations in the pivot also worked fast. I did not tried yet any more complicated calculations or “solve order stuff”.
I can add a bit more info about performance, as I was playing with this more. When pivot table has more dimensions and more measures, then it takes more time for pivot to show results. So clearly as more cells are calculated, performance is clearly affected. It is quite common for my a bit bigger reports to take 10-20seconds to come back. I still think that performance is good.
Also, in the previous comments there was a suggestion to use 64bit Excel. I actually found in Gemini documentation that Microsoft recommends to use 64bit Excel if you want to load more than a million records.
October 20th, 2010 at 2:48 am
i have a existing cube i want to add some more dimenssions to it.
But when i m adding new dimenssion i m getting a Blank space. And i want to remove that blank space, in place of Default Member. How do i do that?