I am new to this area as well as the terminology so please feel free to suggest if I go wrong somewhere. I have two datasets like this: <h3>Dataset 1:</h3> <pre class="prettyprint"><code>A B C 0 E A 0 C 0 0 A 0 C D E A 0 C 0 E </code></pre> The way I interpret this is at some point in time, (A,B,C,E) occurred together and so did (A,C), (A,C,D,E) etc. <h3>Dataset 2:</h3> <pre class="prettyprint"><code>5A 1B 5C 0 2E 4A 0 5C 0 0 2A 0 1C 4D 4E 3A 0 4C 0 3E </code></pre> The way I interpret this is at some point in time, 5 occurrences of A, 1 occurrence of B, 5 occurrences of C and 2 occurrences of E happened and so on. I am trying to find what items occur together and if possible, also find out the cause and effect for this. For this, I am not understanding how to go about using both the datasets (or if one is enough). It would be good to have a good tutorial on this but my primary question is which dataset to utilize and how to proceed in (i) building a frequent itemset and (ii) building association rules between them. Can someone point me to a practical tutorials/examples (preferably in Python) or at least explain in brief words on how to approach this problem?

Some theoretical facts about association rules: <ul> <li>Association rules is a type of undirected data mining that finds patterns in the data where the target is not specified beforehand. Whether the patterns make sense is left to human interpretation.</li> <li>The goal of association rules is to detect relationships or association between specific values of categorical variables in large sets.</li> <li>And is rules can intrepreted as "70% of the the customers who buy wine and cheese also buy grapes".</li> </ul> To find association rules, you can use apriori algorithm. There already exists many python implementation, although most of them are not efficient for practical usage: <ul> <li>source1: http://code.google.com/p/autoflash/source/browse/trunk/python/apriori.py?r=31 </li> <li>source2: http://www.nullege.com/codes/show/src%40l%40i%40libbyr-HEAD%40test_freq_item_algos.py/5/apriori/python </li> </ul> or use Orange data mining library, which has a good library for association rules. Usage example: <pre class="prettyprint"><code>''' save first example as item.basket with format A, B, C, E A, C A, C, D, E A, C, E open ipython same directory as saved file or use os module >>> import os >>> os.chdir("c:/orange") ''' import orange items = orange.ExampleTable("item") #play with support argument to filter out rules rules = orange.AssociationRulesSparseInducer(items, support = 0.1) for r in rules: print "%5.3f %5.3f %s" % (r.support, r.confidence, r) </code></pre> To learn more about association rules/frequent item mining, then my selection of books are: <ul> <li> "Introduction to Data mining" - Vipin Kumar, best book for beginner</li> <li> "Data mining and knowledge discovery handbook", for advanced user</li> <li> "Mining massive data" - tips how to use in reallife and how build efficient solutions, free book, http://i.stanford.edu/~ullman/mmds.html </li> <li>Ofcourse there are many fantastic scientific papers to read: by example do some search on MS Acedemic about Frequent Pattern mining </li> </ul> There is no short way.

Using frequent itemset mining to build association rules?

Tags:

I am new to this area as well as the terminology so please feel free to suggest if I go wrong somewhere. I have two datasets like this:

Dataset 1:

A B C 0 E A 0 C 0 0 A 0 C D E A 0 C 0 E

The way I interpret this is at some point in time, (A,B,C,E) occurred together and so did (A,C), (A,C,D,E) etc.

Dataset 2:

5A 1B 5C  0 2E 4A  0 5C  0  0 2A  0 1C 4D 4E 3A  0 4C  0 3E

The way I interpret this is at some point in time, 5 occurrences of A, 1 occurrence of B, 5 occurrences of C and 2 occurrences of E happened and so on.

I am trying to find what items occur together and if possible, also find out the cause and effect for this. For this, I am not understanding how to go about using both the datasets (or if one is enough). It would be good to have a good tutorial on this but my primary question is which dataset to utilize and how to proceed in (i) building a frequent itemset and (ii) building association rules between them.

Can someone point me to a practical tutorials/examples (preferably in Python) or at least explain in brief words on how to approach this problem?

869

asked Aug 13 '11 00:08

Legend

1 Answers

Some theoretical facts about association rules:

Association rules is a type of undirected data mining that finds patterns in the data where the target is not specified beforehand. Whether the patterns make sense is left to human interpretation.
The goal of association rules is to detect relationships or association between specific values of categorical variables in large sets.
And is rules can intrepreted as "70% of the the customers who buy wine and cheese also buy grapes".

To find association rules, you can use apriori algorithm. There already exists many python implementation, although most of them are not efficient for practical usage:

source1: http://code.google.com/p/autoflash/source/browse/trunk/python/apriori.py?r=31
source2: http://www.nullege.com/codes/show/src%40l%40i%40libbyr-HEAD%40test_freq_item_algos.py/5/apriori/python

or use Orange data mining library, which has a good library for association rules.

Usage example:

''' save first example as item.basket with format A, B, C, E A, C A, C, D, E A, C, E open ipython same directory as saved file or use os module >>> import os >>> os.chdir("c:/orange") ''' import orange  items = orange.ExampleTable("item") #play with support argument to filter out rules rules = orange.AssociationRulesSparseInducer(items, support = 0.1)  for r in rules:     print "%5.3f %5.3f %s" % (r.support, r.confidence, r)

To learn more about association rules/frequent item mining, then my selection of books are:

"Introduction to Data mining" - Vipin Kumar, best book for beginner
"Data mining and knowledge discovery handbook", for advanced user
"Mining massive data" - tips how to use in reallife and how build efficient solutions, free book, http://i.stanford.edu/~ullman/mmds.html
Ofcourse there are many fantastic scientific papers to read: by example do some search on MS Acedemic about Frequent Pattern mining

There is no short way.

answered Oct 24 '22 22:10

timgluz

Related questions
                            
                                How to load file into buffer and switch to buffer on start up in Emacs
                            
                                Android: How can I detect if the Back button will exit the app (i.e. this is the last activity left on the stack)?
                            
                                Using setcap in linux [closed]
                            
                                ctypes vs C extension
                            
                                Which NIO library (Netty, Grizzly, kryonet, ...) for simple backend server implementation in Java?
                            
                                clang++ C++11 invocation
                            
                                New features in JDK 1.6 and 1.7
                            
                                Difference between Eclipse's "clean project" and Maven's "mvn clean" in m2e
                            
                                escaping newlines in sed replacement string
                            
                                Detect fling gesture over clickable items
                            
                                Understanding loff_t *offp for file_operations
                            
                                Notepad++ workspace refresh?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With