Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm for finding smallest collection of components

I'm looking for an algorithm to solve the following problem. I have a number of subsets (1-n) of a given set (a-h). I want to find the smallest collection of subsets that will allow me to construct, by combination, all of the given subsets. This collection can contain subsets that do not exist in 1-n yet.

  a b c d e f g h
1 1
2 1   1
3   1     1   1
4 1       1
5   1         1
6 1     1   1   1
7 1       1 1   1
8 1   1       1
9 1         1   1

Below are two possible collections, the smallest of which contains seven subsets. I have denoted new subsets with an x.

1 1
x   1
x     1
x       1
x         1
x           1
x             1
x               1

1 1
x   1         
x     1
x       1        
x         1    
x           1   1
x             1

I believe this must be a known problem, but I'm not very familiar with algorithms. Any help is very much appreciated, as is a suggestion for a better topic title.

Thanks!

Update

Graph coloring gets me a long way, thanks. However, in my case subsets are allowed to overlap. For example:

  a b c d
1 1 1 1  
2 1 1 1 
3 1 1 1
4     1 1
5 1 1 1 1

Graph coloring gives me this solution:

x 1 1
x     1
x       1     

But this one is valid as well, and is smaller:

1 1 1 1  
4     1 1
like image 785
user3170702 Avatar asked Jan 07 '14 20:01

user3170702


2 Answers

This problem is known as Set Basis, and it is NP-complete (Larry J. Stockmeyer: The set basis problem is NP-complete. Technical Report RC-5431, IBM, 1975). Its formulation as a graph problem is Bipartite Dimension. Since it is very hard to solve in general, it might be useful to look if there are any helpful properties of your data (e.g., are the sets small? Is the solution small? Can all sets occur?)

I cannot think of an easy ILP formulation. Instead, you could try to reduce the problem to Clique Cover, which is better studied, using either the reduction from Kou&Wong or the one from Nor et al.. I have coauthered a paper discussing algorithms for Clique Cover, and written a Clique cover solver with both an exact solver and two heuristics.

like image 199
Falk Hüffner Avatar answered Sep 20 '22 04:09

Falk Hüffner


This problem was shown in one the video's of Coursera's Discrete Optimization lectures. IIRC, it's called the set cover problem.

IIRC, it's NP-complete or NP-hard, so look into the typical algorithms (exact algo's for small datasets, metaheuristics for medium/big datasets) and typical frameworks (OptaPlanner, ...)

like image 28
Geoffrey De Smet Avatar answered Sep 20 '22 04:09

Geoffrey De Smet