Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does the items in one bracket reperesent in sequential pattern mining

Tags:

data-mining

I have seen many databases for sequential pattern mining and the sequences they take in these databases are like

<(af)(d)(e)(a)>

<(e)(abf)(bde)>

What does the set of items in one bracket like (af), (abf), (bde) represent? Does it mean that they are related to one another or something else

On what basis do we classify items into this one element? I am using a weblog file as dataset.

like image 252
akshay reddy Avatar asked Oct 05 '22 12:10

akshay reddy


1 Answers

The input of a sequential pattern mining algorithm is a sequence database. A sequence is an ordered list of itemsets.

Here is an example of sequence:

<(e)(abf)(bde)>

This sequence should be interpreted as follows:

First the item "e" occurred. It was then followed by "a", "b" and "f" simultaneously. These items where then followed by "b", "d" and "e" simultaneously.

So the answer is items between brackets are assumed to be unordered or occuring at the same time. Items between brackets are called an "itemset".

Note that it is also assumed that no item can appear more than once in an itemset. So it woul be illegal to have an itemset such as (a a b)

Moreover, you should also know that most sequential pattern mining algorithms assume that items in an itemset are lexically ordered (e.g. PrefixSpan). If the items are not lexically ordered in an itemset, the algorithms may not provide the good result becauset they use some optimization that take this assumption.

If you want to try some sequential pattern mining algorithm, you can have a look at the SPMF software : http://www.philippe-fournier-viger.com/spmf/ which provide a graphical user interface and many examples (i'm the project founder).

Hope this answer your question well.

like image 193
Phil Avatar answered Oct 10 '22 02:10

Phil