I'm trying to understand the fundamentals of the Apriori (Basket) Algorithm for use in data mining,
It's best I explain the complication i'm having with an example:
Here is a transactional dataset:
t1: Milk, Chicken, Beer
t2: Chicken, Cheese
t3: Cheese, Boots
t4: Cheese, Chicken, Beer
t5: Chicken, Beer, Clothes, Cheese, Milk
t6: Clothes, Beer, Milk
t7: Beer, Milk, Clothes
The minsup for the above is 0.5 or 50%.
Taking from the above, my number of transactions is clearly 7, meaning for an itemset to be "frequent" it must have a count of 4/7. As such this was my Frequent itemset 1:
F1:
Milk = 4
Chicken = 4
Beer = 5
Cheese = 4
I then created my candidates for the second refinement (C2) and narrowed it down to:
F2:
{Milk, Beer} = 4
This is where I get confused, if I am asked to display all frequent itemsets do I write down all of F1
and F2
or just F2
? F1
to me aren't "sets".
I am then asked to create association rules for the frequent itemsets I have just defined and calculate their "confidence" figures, I get this:
Milk -> Beer = 100% confidence
Beer -> Milk = 80% confidence
It seems superfluous to put F1
's itemsets in here as they will all have a confidence of 100% regardless and don't actually "associate" anything, which is the reason I am now questioning whether F1
are indeed "frequent"?
Apriori algorithm uses frequent itemsets to generate association rules. It is based on the concept that a subset of a frequent itemset must also be a frequent itemset. Frequent Itemset is an itemset whose support value is greater than a threshold value(support).
Frequent Itemset Mining is a method for market basket analysis. It aims at finding regularities in the shopping behavior of customers of supermarkets, mail-order companies, on-line shops etc. ⬈ More specifically: Find sets of products that are frequently bought together.
Frequent patterns are itemsets, subsequences, or substructures that appear in a data set with frequency no less than a user-specified threshold. For example, a set of items, such as milk and bread, that appear frequently together in a transaction data set, is a frequent itemset.
Frequent itemsets are those items whose support is greater than the threshold value or user-specified minimum support. It means if A & B are the frequent itemsets together, then individually A and B should also be the frequent itemset.
Itemsets with size of 1 considered frequent if their support is suitable. But here you have to consider the minimal threshold. like if your minimal threshold in your example is 2 then F1
will not be considered. But if the minimal threshold is 1 then you have to.
you can take a look here and here for more ideas and examples.
Hope that I helped.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With