Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

algorithm issue - find the least common subset

Tags:

algorithm

a's are objects with multiple "categories", b's, for instance a1 has three cateories b1,b2,b3. The problem is to, reduce the number of categories (which can grow rather large), into groups that always occurs together. A "largest common subset" thing.

So for instance, given the following data set:

a1{ b1,b2,b3 } 
a2{ b2,b3 }
a3{ b1,b4 }

We can find that b2 and b3 always comes together..

b23 = {b2,b3}

..and we can reduce the category set to this:

a1{ b1, b23 }
a2{ b23 }
a3{ b1,b4 }

So, my issue is to find some algorithm to solve this problem.

I have started to look at the Longest Common Sequence problem, and it might be a solution. i.e. something like repeatedly grouping categories like this b' = LCS(set_of_As) until all categories has been traversed. However, this is not complete. I have to limit the input domain in some way to make this possible.

Do I miss something obvious? Any hints of a problem domain you can point me to? Does anyone recognize any other approach to such a problem.

like image 706
Petter Nordlander Avatar asked Oct 23 '12 18:10

Petter Nordlander


1 Answers

Transform your sets to have sets of b's that include a's:

b1 { a1, a3 }
b2 { a1, a2 }
b3 { a1, a2 }
b4 { a3 }

Make sure the contents of the new b sets are sorted.

Sort your b sets by their contents.

Any two adjacent sets with the same elements are b's that occur in the same a sets.

like image 190
Rafael Baptista Avatar answered Oct 13 '22 01:10

Rafael Baptista