algorithm issue - find the least common subset

Question

a's are objects with multiple "categories", b's, for instance a1 has three cateories b1,b2,b3. The problem is to, reduce the number of categories (which can grow rather large), into groups that always occurs together. A "largest common subset" thing.

So for instance, given the following data set:

a1{ b1,b2,b3 } 
a2{ b2,b3 }
a3{ b1,b4 }

We can find that b2 and b3 always comes together..

b23 = {b2,b3}

..and we can reduce the category set to this:

a1{ b1, b23 }
a2{ b23 }
a3{ b1,b4 }

So, my issue is to find some algorithm to solve this problem.

I have started to look at the Longest Common Sequence problem, and it might be a solution. i.e. something like repeatedly grouping categories like this b' = LCS(set_of_As) until all categories has been traversed. However, this is not complete. I have to limit the input domain in some way to make this possible.

Do I miss something obvious? Any hints of a problem domain you can point me to? Does anyone recognize any other approach to such a problem.

Rafael Baptista · Accepted Answer

Transform your sets to have sets of b's that include a's:

b1 { a1, a3 }
b2 { a1, a2 }
b3 { a1, a2 }
b4 { a3 }

Make sure the contents of the new b sets are sorted.

Sort your b sets by their contents.

Any two adjacent sets with the same elements are b's that occur in the same a sets.

algorithm issue - find the least common subset

Tags:

algorithm

Petter Nordlander

1 Answers

Rafael Baptista

Recent Activity

Donate For Us

algorithm issue - find the least common subset

Tags:

algorithm

Petter Nordlander

1 Answers

Rafael Baptista

Related questions

Recent Activity

Donate For Us