I have seen many databases for sequential pattern mining and the sequences they take in these databases are like
<(af)(d)(e)(a)>
<(e)(abf)(bde)>
What does the set of items in one bracket like (af), (abf), (bde) represent? Does it mean that they are related to one another or something else
On what basis do we classify items into this one element? I am using a weblog file as dataset.
The input of a sequential pattern mining algorithm is a sequence database. A sequence is an ordered list of itemsets.
Here is an example of sequence:
<(e)(abf)(bde)>
This sequence should be interpreted as follows:
First the item "e" occurred. It was then followed by "a", "b" and "f" simultaneously. These items where then followed by "b", "d" and "e" simultaneously.
So the answer is items between brackets are assumed to be unordered or occuring at the same time. Items between brackets are called an "itemset".
Note that it is also assumed that no item can appear more than once in an itemset. So it woul be illegal to have an itemset such as (a a b)
Moreover, you should also know that most sequential pattern mining algorithms assume that items in an itemset are lexically ordered (e.g. PrefixSpan). If the items are not lexically ordered in an itemset, the algorithms may not provide the good result becauset they use some optimization that take this assumption.
If you want to try some sequential pattern mining algorithm, you can have a look at the SPMF software : http://www.philippe-fournier-viger.com/spmf/ which provide a graphical user interface and many examples (i'm the project founder).
Hope this answer your question well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With