Calculate Rule Importance in MS Association Rules
Reposted from Thomas Ivarsson blog with the author's permission. Read Full Article
Thanks to Jamie MacLennan and Bogdan Crivat at Predixion that have helped me with answers to my questions.
Regarding the relation between rule probability and rule importance Jamie explained this in the following way. If people always by milk at the grocery store together with other products you will have a high probablity of milk to occur with other products. All products that are bought together with milk will have a 100 precent probability. This rule is not that interesting and will have a lower importance than probability.
The starting point for calculating the rule importance is a correlation matrix with Touring Tire Tube and Touring Tire transactions. The data source for the market basket model is the dbo.vAssocSeqLineItems in the AdventureWorksDW relational database. I order to follow how the MS Association rules algorithm works I have added 1 to the result for all cells except the total row and the total column.
|
Touring Tire |
Not Touring Tire |
Total |
Touring Tire Tube |
759 |
640 |
1 399 |
Not Touring Tire Tube |
124 |
19 736 |
19 860 |
Total |
883 |
20 736 |
21 259 |
Some of the queries are these:
select * from dbo.vAssocSeqLineItems
where Model = 'Touring Tire' ---881 records
select * from dbo.vAssocSeqLineItems
where Model = 'Touring Tire Tube' --1397 records
Select * from dbo.vAssocSeqLineItems v
Join dbo.vAssocSeqLineItems v1
On v.OrderNumber = v1.OrderNumber
Where v.Model = 'Touring Tire' and v1.Model = 'Touring Tire Tube' --758 records
--Credit to Bogdan for this one.
SELECT DISTINCT OrderNumber FROM dbo.vAssocSeqLineItems
EXCEPT
(
SELECT DISTINCT OrderNumber FROM
(
SELECT OrderNumber FROM dbo.vAssocSeqLineItems WHERE Model='Touring Tire Tube'
UNION
SELECT OrderNumber FROM dbo.vAssocSeqLineItems WHERE Model='Touring Tire'
) AS T
) --19 735 Records
With the correlation matrix cells completed you can go to next step and calculate the importance of the rules. The figures in the BIDS viewer are:
1,43722003 Touring Tire -> Touring Tire Tube
1,93898162 Touring Tire Tube -> Touring Tire
Let us start with the Importance of the rule Touring Tire Tube -> Touring Tire and check the numbers in the table above. Note that +1 has been added to each cell except for the totals that includes this addition.
The calculation is done like this in several steps:
1. (Touring Tire, Touring Tire Tube) / (Touring Tire Tube Total) = 759 / 1 399
2. (Touring Tire, Not Touring Tire Tube) / (Not Touring Tire Tube Total) = 124 / 19 860
3. Divide the ratios in 1 and 2 = 86, 89
4. Add Log10 (in Excel) to point 3 above = Log10(86,89) = 1,9390. I am not an expert in mathematics but log10 means that you decrease the scale.
The same calculation for Touring Tire -> Touring Tire Tube is as follows:
1. (Touring Tire, Touring Tire Tube) / (Touring Tire Total) = 759 / 883
2. (Touring Tire Tube, Not Touring Tire) / (Not Touring Tire Total) = 640 / 20 376
3. Divide the ratios in 1 and 2 = 27,37
4. Add Log10(in Excel) to the result of point 3 = Log10(27,37) = 1,437
I hope that you with these examples can follow the calculations of MS Association Rule importance directly from a data source.
Happy Association!
Thomas Ivarsson has been working with the MS BI platform since SQL Server 7 in 1999. Presently he is working in the telecom industry in Sweden, with a data warehouse based on SQL Server 2005. From 1999 to 2007 he worked as a consultant also on the three SQL Server BI platforms. During the latest years he has spent most of time on SSAS, Reporting Services, ProClarity and Performance Point. He also has several years experience of the ETL process with DTS and SSIS. During 2008 and 2009 he has been working with introducing data mining in his daily business to see patterns in a service network behaviours. His blog can be found here: http://thomasianalytics.spaces.live.com |
Tags: data mining