Report Portal

UNION Operation in DAX Queries

Reposted from Jason Thomas blog with the author's permission.

In one of my previous posts - Row Selection Using Slicers in PowerPivot - Part 1, I had demonstrated the use of what Marco Russo defined as Reverse Linked Tables and Linkback tables. A particularly eye-catching thing in my post was the use of Microsoft Query instead of DAX Query to obtain the required reverse-linked table and one of the reasons I gave was that it was difficult to do an UNION operation using DAX queries (yes, you heard it right. I said difficult and not impossible). Well, since difficult is such a subjective word, I decided to jot down the technique, maybe it might seem easy for you guys!

Union operation with DAX queries / PowerPivot

For the purpose of this demonstration, I am using two tables having identical structures. There are just two columns in the tables - country and state and we need to do an UNION operation on these two tables.

Source

The result should be 7 rows as shown below

Expected Result

Follow the steps below for the solution:-

1) Import the two tables to PowerPivot (you can also use this technique on a SSAS tabular model). I will be using DAX Studio for writing my queries and displaying the results (though you might as well as use this in SSMS or in the DMX query editor for SSRS depending on your purpose).

2) The first thing to understand here is that DAX, as a query language, can not add rows usually, and UNION requires that we get more rows (since it is the combined result of both the tables). However, there is an operator in DAX which generally generates more rows than its source tables – CROSSJOIN (except when any one of the participating tables has only one row). So let’s first crossjoin the two tables and see the results.

CrossJoin

Now you can see that we get 12 rows, however no single column gives the result that we need. Somehow, we need to get a logic to filter the 5 rows and also a logic to combine the right results in one calculated column.

3) To identify the individual rows, let us add a rank column to each of the two tables and then crossjoin them.

EVALUATE
CROSSJOIN (
ADDCOLUMNS (Table1,"Rnk1",RANKX ( Table1, Table1[State],, 1, DENSE )    ),
ADDCOLUMNS (Table2,"Rnk2",RANKX ( Table2, Table2[State],, 1, DENSE )    )
  )

Adding rank identifiers to the table rows

4) When I looked at this resultset initially, I felt that I could just filter all rows having Rank1 = 1 and Rank2 = 2 and then add a calculated column each for Country and State such that if Rank1 = 1, then the value comes from Table2 else it comes from Table1.

EVALUATE
ADDCOLUMNS (
FILTER (
CROSSJOIN (
ADDCOLUMNS (Table1,"Rnk1",RANKX ( Table1, Table1[State],, 1, DENSE )        ),
ADDCOLUMNS (Table2,"Rnk2",RANKX ( Table2, Table2[State],, 1, DENSE )        )
      ),
      [Rnk1] = 1 || [Rnk2] = 2    ),
"UnionCntry", IF ( [Rnk1] = 1, Table2[Country], Table1[Country] ),
"UnionState", IF ( [Rnk1] = 1, Table2[State], Table1[State] )
  )

Incorrect logic

Even though the UnionCntry and UnionState columns might appear right, they are not, as one row is missing (in this case, the row with IN country and KL state). This happens because there will always be one row which has Rnk1 = 1 and Rnk2 = 2, and hence there is an overlap. So we need to think of a different technique to filter the rows.

5) Since we are short of one row, we need to include one more extra row having the complementary values of ranks for the overlapping row (in this case, the row with Rnk1=2 and Rnk2=1). This can be done by using the query given below.

EVALUATE
FILTER(
CROSSJOIN (
ADDCOLUMNS (Table1,"Rnk1",RANKX ( Table1, Table1[State],, 1, DENSE )    ),
ADDCOLUMNS (Table2,"Rnk2",RANKX ( Table2, Table2[State],, 1, DENSE )    )
  ), ([Rnk1]=1 || [Rnk2]=2) || ([Rnk1]=2 && [Rnk2]=1))

Filtering the right number of rows

6) Now all we have to do is to get the logic for creating the calculated columns. We can do it with the help of a SWITCH statement by (a) assigning the extra row (Rnk1=2 and Rnk2=1) to Table1 and (b) making sure that the overlapping row (Rnk1=1 and Rnk2=2) is assigned to Table2. This can be done by the following query

EVALUATE
ADDCOLUMNS(
FILTER(
CROSSJOIN (
ADDCOLUMNS (Table1,"Rnk1",RANKX ( Table1, Table1[State],, 1, DENSE )    ),
ADDCOLUMNS (Table2,"Rnk2",RANKX ( Table2, Table2[State],, 1, DENSE )    )
  ), ([Rnk1]=1 || [Rnk2]=2) || ([Rnk1]=2 && [Rnk2]=1))
, "UnionCountry", SWITCH(TRUE,
                    ([Rnk2]=2 && [Rnk1]<>1) || ([Rnk2]=1 && [Rnk1]=1)
                    ,Table1[Country], Table2[Country])
, "UnionState", SWITCH(TRUE,
                    ([Rnk2]=2 && [Rnk1]<>1) || ([Rnk2]=1 && [Rnk1]=1)
                    ,Table1[State], Table2[State]))

Correct result of UNION ALL operation

Now you can see that the UnionCountry and UnionState column gives us the required results for a UNION ALL operation. This approach can be easily extended in case we have more than two tables to join. The only scenario where this approach will not work is when any of the participating tables has just one row (as the cross join will not give us more rows. Eg, cross-joining two tables with 3 and 1 rows will give only 3 rows whereas we need 4 rows for the UNION ALL). However, this can be resolved by adding a temporary table with two rows (so the cross join will be 3 * 1 * 2 = 6 rows) and then using this logic.

7) To do the UNION operation instead of UNION ALL, we should remove all duplicate rows. For that, we can use SUMMARIZE function.

EVALUATE
SUMMARIZE(
ADDCOLUMNS(
FILTER(
CROSSJOIN (
ADDCOLUMNS (Table1,"Rnk1",RANKX ( Table1, Table1[State],, 1, DENSE )    ),
ADDCOLUMNS (Table2,"Rnk2",RANKX ( Table2, Table2[State],, 1, DENSE )    )
  ), ([Rnk1]=1 || [Rnk2]=2) || ([Rnk1]=2 && [Rnk2]=1))
, "UnionCountry", SWITCH(TRUE,
                    ([Rnk2]=2 && [Rnk1]<>1) || ([Rnk2]=1 && [Rnk1]=1),
Table1[Country], Table2[Country])
, "UnionState", SWITCH(TRUE,
                    ([Rnk2]=2 && [Rnk1]<>1) || ([Rnk2]=1 && [Rnk1]=1),
Table1[State], Table2[State]))
, [UnionCountry], [UnionState])

Correct result of UNION operation

In this scenario, there is no difference in the result as there are no duplicate rows. Let me know what you think of this!


jason-thomas

Jason has been working with Microsoft BI tools since he joined the IT industry in 2006. Since then he has worked across SSAS, SSRS and SSIS for a large number of clients like Unilever, Imperial Tobacco, IATA, Schneider Electric, etc. He is currently working at MindTree Ltd, Bangalore (www.mindtree.com) and his personal blog site can be found at http://www.sqljason.com His blogs can also be found at the popular technical site BeyondRelational.com at http://beyondrelational.com/blogs/jason


Tags: dax, tabular

 

2007-2015 VidasSoft Systems Inc.