Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Minimum support and minimum confidence in Data Mining

I would like to know if minimum support and minimum confidence can be automatically determined in mining association rules? If so any hint or pointer to resource would be great.

like image 633
user3036757 Avatar asked Aug 15 '14 06:08

user3036757


People also ask

What is minimum support and confidence?

The minimum support and minimum confidence are set by the users, and are parameters of the Apriori algorithm for association rule generation. These parameters are used to exclude rules in the result that have a support or a confidence lower than the minimum support and minimum confidence respectively.

What is minimum support in data mining?

The first step in association rule mining is the determination of the frequent item set that will be involved in the rule formation process. In this step, a threshold is used to eliminate items excluded in the frequent itemset which is also known as the minimum support.

What is support and confidence in data mining?

Support is an indication of how frequently the items appear in the data. Confidence indicates the number of times the if-then statements are found true.

Can minimum support and minimum confidence be automatically determined in Mining Association?

Bookmark this question. Show activity on this post. I would like to know if minimum support and minimum confidence can be automatically determined in mining association rules? If so any hint or pointer to resource would be great. Show activity on this post. Yes, there exist some method to automatically determine the minsup and minconf threshold.

What are support factors in mining models?

The number of groups containing the joined rule head and rule body As in the case of the support factor, you can specify that only rules that achieve a certain minimum level of confidence are included in your mining model.

What is the multiple Min-supports mining algorithm?

The multiple min-supports mining algorithm using maximum constraints INPUT: A set of n transaction data T, a set of p items to be purchased, each item ti with a minimum support value mi, i = 1 to p, and a minimum con�dence value k. OUTPUT: A set of association rules in the criterion of the maximum values of minimum supports.

Is there such a thing as 100% confidence in data mining?

There is no such thing as 100% confidence in data mining. Anyone who tells you otherwise isn’t a Data Scientist. What is data mining? Data mining is the process of analyzing large amounts of data in an effort to find correlations, patterns, and insights.


1 Answers

Yes, there exist some method to automatically determine the minsup and minconf threshold.

But first, let me tell you a little bit about how to choose the minsup and minconf parameters. Choosing them depends on your data.

For the minimum support, I use 80 % on some data. For some other data, I use 0.05 %. It all depends on the dataset. Usually, I start with a high value, and then I decrease the values until I find a value that will generate enough patterns.

For the min. confidence, it is a little bit easier because it represents the confidence that you want in the rules. So usually, I use something like 60 % because I'm not interested in a rule that is truly less than 60 % of the time. But it also depends on the data.

In terms of performance, when minsup is higher you will find less pattern and the algorithm is faster. For minconf, when it is set higher, there will be less pattern but it may not be faster because many algorithms don't use minconf to prune the search space. So obviously, setting these parameters also depends on how many rules you want.

If you don't want to use the minsup parameter you can use a top-k association rule mining algorithm. In this case, you will specify k=1000 for instance and the algorithm will discover the 1000 most frequent rules with given minimum confidence. I have designed one such algorithm named TopKRules for association rule mining. You can download the source code from the SPMF open-source data mining library, which offers many implementations of association rule and pattern mining algorithm.

Another solution to set the minsup threshold automatically is to use a mathematical function to set it in terms of how much data you have. You can see my blog post here as an example of how to do it.

Some other works have attempted to find a solution to setting minsup and minconf. You may find them on Google Scholar.

like image 114
Phil Avatar answered Jan 01 '23 18:01

Phil