Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between Closed and open Sequential Pattern Mining Algorithms

I want to use some algorithms to mine my log data.

I found a pattern mining framework on: http://www.philippe-fournier-viger.com/spmf/index.php?link=algorithms.php

I have tried several algorithms, the BIDE+ algorithm performs the best.

The BIDE+ algorithm is for mining frequent closed sequential patterns from a sequence database.

Can someone explain the definition about "closed" sequential patterns and open ones?

like image 431
leon Avatar asked Apr 22 '13 10:04

leon


People also ask

What is a closed sequential pattern?

A closed sequential pattern is a sequential pattern such that it is not strictly included in another pattern having the same support.

Which one is the closed sequential pattern mining method?

b c d e is a closed sequential pattern because there it is not included in any other sequential pattern having the same support.

What are the different types of pattern mining?

As mentioned earlier, generally speaking, data mining tasks and patterns can be classified into three main categories: prediction, association, and clustering.

What are the difference between association rule and sequential pattern discovery in data mining?

The sequential rules are found in sequences while association rules are found in records (transactions) containing items that are not ordered. In other words, the order between items such as time is not considered in association rule mining.


2 Answers

Glad that you are using my SPMF software.

The support of a sequential pattern is the number of sequences that contains the sequential pattern.

A frequent sequential pattern is a pattern that appears in at least "minsup" sequences of a sequence database, where minsup is a parameter set by the user.

A frequent closed sequential pattern is a frequent sequential pattern such that it is not included in another sequential pattern having exactly the same support.

Algorithms such as PrefixSpan finds frequent sequential patterns. Algorithms such as BIDE+ finds frequent closed sequential patterns. BIDE+ is usually much faster than PrefixSpan because it uses pruning techniques to avoid generating all sequential patterns. Moreover, the set of closed patterns is usually much smaller than the set of sequential patterns so BIDE+ is also more memory efficient.

Another important thing to know is that closed sequential patterns are a compact and lossless representation of all sequential patterns. This means that the set of closed sequential patterns is usually much smaller but it is lossless, which means that it allows recovering the full set of sequential patterns (no information is loss), which is very convenient.

I can give you a simple example.

Let's consider 4 sequences:

a  b  c  d  e a  b  d b  e  a   b  c  d  e 

Let's say that minsup = 2.

b c is a frequent sequential patterns because it appears in two sequences (it has a support of 2). b c is not a closed sequential patterns because it is contained in a larger sequential pattern b c d having the same support.

b c d has a support of 2. It is also not a closed sequential pattern because it is contained in a larger sequential pattern b c d e having the same support. b c d e is a closed sequential pattern because there it is not included in any other sequential pattern having the same support.

By the way, you can also check my survey about sequential pattern mining. It gives a good introduction about this topic and the different algorithms.

like image 157
Phil Avatar answered Oct 20 '22 00:10

Phil


Check out this chapter on Frequent Itemsets and Frequent item sets Mining & Association Rules

like image 40
babalu Avatar answered Oct 20 '22 01:10

babalu