I want to use some algorithms to mine my log data.
I found a pattern mining framework on: http://www.philippe-fournier-viger.com/spmf/index.php?link=algorithms.php
I have tried several algorithms, the BIDE+ algorithm performs the best.
The BIDE+ algorithm is for mining frequent closed sequential patterns from a sequence database.
Can someone explain the definition about "closed" sequential patterns and open ones?
A closed sequential pattern is a sequential pattern such that it is not strictly included in another pattern having the same support.
b c d e is a closed sequential pattern because there it is not included in any other sequential pattern having the same support.
As mentioned earlier, generally speaking, data mining tasks and patterns can be classified into three main categories: prediction, association, and clustering.
The sequential rules are found in sequences while association rules are found in records (transactions) containing items that are not ordered. In other words, the order between items such as time is not considered in association rule mining.
Glad that you are using my SPMF software.
The support of a sequential pattern is the number of sequences that contains the sequential pattern.
A frequent sequential pattern is a pattern that appears in at least "minsup" sequences of a sequence database, where minsup is a parameter set by the user.
A frequent closed sequential pattern is a frequent sequential pattern such that it is not included in another sequential pattern having exactly the same support.
Algorithms such as PrefixSpan finds frequent sequential patterns. Algorithms such as BIDE+ finds frequent closed sequential patterns. BIDE+ is usually much faster than PrefixSpan because it uses pruning techniques to avoid generating all sequential patterns. Moreover, the set of closed patterns is usually much smaller than the set of sequential patterns so BIDE+ is also more memory efficient.
Another important thing to know is that closed sequential patterns are a compact and lossless representation of all sequential patterns. This means that the set of closed sequential patterns is usually much smaller but it is lossless, which means that it allows recovering the full set of sequential patterns (no information is loss), which is very convenient.
I can give you a simple example.
Let's consider 4 sequences:
a b c d e a b d b e a b c d e
Let's say that minsup = 2.
b c
is a frequent sequential patterns because it appears in two sequences (it has a support of 2). b c
is not a closed sequential patterns because it is contained in a larger sequential pattern b c d
having the same support.
b c d
has a support of 2. It is also not a closed sequential pattern because it is contained in a larger sequential pattern b c d e
having the same support. b c d e
is a closed sequential pattern because there it is not included in any other sequential pattern having the same support.
By the way, you can also check my survey about sequential pattern mining. It gives a good introduction about this topic and the different algorithms.
Check out this chapter on Frequent Itemsets and Frequent item sets Mining & Association Rules
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With