Frequent Itemsets & Association Rules - Apriori Algorithm

Tags:

I'm trying to understand the fundamentals of the Apriori (Basket) Algorithm for use in data mining,

It's best I explain the complication i'm having with an example:

Here is a transactional dataset:

t1: Milk, Chicken, Beer
t2: Chicken, Cheese
t3: Cheese, Boots
t4: Cheese, Chicken, Beer
t5: Chicken, Beer, Clothes, Cheese, Milk
t6: Clothes, Beer, Milk
t7: Beer, Milk, Clothes

The minsup for the above is 0.5 or 50%.

Taking from the above, my number of transactions is clearly 7, meaning for an itemset to be "frequent" it must have a count of 4/7. As such this was my Frequent itemset 1:

F1:

Milk = 4
Chicken = 4
Beer = 5
Cheese = 4

I then created my candidates for the second refinement (C2) and narrowed it down to:

F2:

{Milk, Beer} = 4

This is where I get confused, if I am asked to display all frequent itemsets do I write down all of F1 and F2 or just F2? F1 to me aren't "sets".

I am then asked to create association rules for the frequent itemsets I have just defined and calculate their "confidence" figures, I get this:

Milk -> Beer = 100% confidence
Beer -> Milk = 80% confidence

It seems superfluous to put F1's itemsets in here as they will all have a confidence of 100% regardless and don't actually "associate" anything, which is the reason I am now questioning whether F1 are indeed "frequent"?

717

asked Jan 06 '13 15:01

Myles Gray

1 Answers

Itemsets with size of 1 considered frequent if their support is suitable. But here you have to consider the minimal threshold. like if your minimal threshold in your example is 2 then F1 will not be considered. But if the minimal threshold is 1 then you have to.

you can take a look here and here for more ideas and examples.

Hope that I helped.

answered Oct 30 '22 00:10

mamdouh alramadan

Related questions
                            
                                Pathfinding while forcing unique node attributes -- which algorithm should I use?
                            
                                Do any algorithms exist to weight various factors?
                            
                                Best algorithm for matchmaking for a crowd sourced rankings?
                            
                                Optimizing vectorized code for graph adjacency
                            
                                How to keep minimum and maximum take O(1) time in a balanced binary search tree, without mucking about with pointers?
                            
                                dynamic minimum spanning tree
                            
                                minimax for tic-tac-toe
                            
                                How to get the minimum cost of disconnecting some node from each other in a graph
                            
                                algorithm to find relationship of two twitter users
                            
                                Folding a sheet of paper (Computer Vision)
                            
                                External memory data structure to replace vector of maps
                            
                                Bug when implement "check point inside triangle" algorithm
                            
                                Simple explanation of Frederickson's heap selection algorithm
                            
                                Card Shuffling (SPOJ / Interviewstreet)
                            
                                Java mergesort, should the "merge" step be done with queues or arrays?
                            
                                Finding the minimum cycle path in a dynamically directed graph
                            
                                Geometric pattern quality and filling
                            
                                Fast and efficient computation on arrays
                            
                                Longest common contiguous subsequence - algorithm
                            
                                Can't get clean output in my MATLAB implementation of Canny-Deriche

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Frequent Itemsets & Association Rules - Apriori Algorithm

Tags:

algorithm

data-mining

apriori

Myles Gray

People also ask

1 Answers

mamdouh alramadan

Recent Activity

Donate For Us