Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm itemset matching pattern

Tags:

algorithm

I've a set of elements (potentially big) with an order relation:

[a,b,c,d,e,f] 

and a set of frequent patterns (potentially big) with ids:

[a]:1,[b]:2,[c]:3,[a,b]:4,[b,c]:5,[a,b,c]:6

I have a sequence of ordered sets:

[a,b], [e], [c], [e,f], [a,b,c]

I want to match every set in the sequence with the ids of the corresponding patterns:

[a,b]:{1,2,4}, [e]:{}, [c]:{3}, [a,b,c]:{1,2,3,4,5,6}

My goal is to limit the number of passes over the sequence so I want to build a data structure I can use during the scan. I'm thinking of a prefix tree:

──null
   ├──a : 1
   |  |
   |  └──b : 4
   |     |
   |     └──c : { 5, 6 }
   |
   ├──b : 2
   |  |
   |  └──c : 5
   |
   └──c : 3

I scan a set in the sequence and pass it trough the tree multiple times recursively (set, set.tail, set.tail.tail...), every time I reach a node I add the corresponding ids to an array.

Do I miss any peculiar case in my reasoning (just realized I have to put multiple ids for nodes of depth>2 if I don't want to miss [a,c] if [a,b,c] exist in the set) ? Is there any more sophisticated data structure I can use to improve the processing time ?

Edit : In fact at depth n, I need 2^(n-2) ids with my method (considering my tree is dense). I'm not sure it's a valid way to do it...

Edit2 : another approach merging bitmaps of each single element in the sequence to build each pattern (as used in SPADE algorithm).

a  : [1,0,0,0,1]
b  : [0,1,0,0,1]
ab : [0,0,0,0,1]

with some array manipulations, I should be able to match this with the elements of my initial array.

like image 858
Alex Avatar asked Mar 02 '26 06:03

Alex


1 Answers

If you are building a prefix tree (aka trie), all nodes are unique, so the prefix tree for the set {a,b,c} in alphabetic order with continuity look like this:

──null
   ├──a : 1
   |  |
   |  └──b : 4
   |     |
   |     └──c : 6
   |
   ├──b : 2
   |  |
   |  └──c : 5
   |
   └──c : 3

And it maps to the prefix set { a, b, c, ab, bc, abc }.

The tree space complexity is SUM k for k = 1..N ~ O(N^2).

Node.java

class Node
{
    public String str;
    public ArrayList<String> child;

    public Node (String str)
    {
        this.str = str;
        this.child = new ArrayList<String>();
    }
}

MyTree.java

class MyTree
{
    Node head;

    ..

    public void build_tree(String [] symbol)
    {
        this.head = new Node("");
        build(symbol,head,0,symbol.length-1);
    }

    // build the prefix tree through DFS
    private void build(String [] symbol, Node parent, int start, int end)
    {
        Node ch = null;
        for (int i=start; i<=end; i++)
        {
            ch = new Node(symbol[i]);
            parent.child.add(ch);

            if (end - start > 0)
            {
                build(symbol,ch,i,end);
            }
        }
    }
}
like image 83
Khaled.K Avatar answered Mar 03 '26 20:03

Khaled.K



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!