Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Minimum set cover [PHP]

Tags:

algorithm

php

set

Minimum Set Cover is a question where you must find the minimum number of sets needed to cover every element.
For example, imagine that we have a set of X=array(1,2,3,4,5,6) and 5 another set S, where

S[1]=array(1, 4) 
S[2] =array(2, 5) 
S[3] =array(3, 6)
S[4] =array(1, 2, 3) 
S[5] =array(4, 5, 6)

The problem is to find minimum number of sets of S which cover every element of X. So obviously the minimum set cover in our case will be S[4] and S[5] because they cover all the elements.
Does anybody have an idea how to implement this code in PHP. Note, that this is NP-complete so there is no fast algorithm to solve it. Any solution in PHP will be welcomed. And BTW it is not a homework, I need to use this algorithm in my web application in order to generate suggestion list.
Thanks in advance.

Update 1
There are many applications of Set Covering Problem. Some of the interesting ones are:

  1. Construction of Optimal logic circuits
  2. Air-crew Scheduling
  3. Assembly line balancing
  4. Information retrieval
  5. Art Gallery problem
  6. Genome Sequencing
  7. Red-Blue SetCover problem

Update 2
For example, here you can see the working version of the problem I mentioned. Here, even it shows visually the sets. But I need the pure PHP code for that, if somebody has it please be kind to provide us with the working example in PHP. Thanks

Update 3
Finally, I have solved the problem in PHP. My solution based on the algorithm proposed on a very famous book called Introduction to Algorithms, section The set-covering problem. Here how my solution looks like:

$MainSet=array(1, 2, 3, 4, 5, 6, 7);
$SubSet=array(array(1,4), array(2, 5), array(3, 6), array(1, 2, 3), array(4, 5, 6));

$UncoveredElements=$MainSet;
$CoveringSets=array();
while (count($UncoveredElements)>0){
    $S=SubSetS($UncoveredElements, $SubSet);
    if (is_array($S)){
        $UncoveredElements=array_diff($UncoveredElements, $S);
        $CoveringSets[]=$S;
    }else
        break; //finish work
}
echo "Sets that cover MainSet are:";
var_dump($CoveringSets);

//Subset S is chosen that covers as many uncovered elements as possible
function SubSetS($UncoveredElements, $SubSetArr){
    $max=0; $s=array();
    foreach($SubSetArr as $SubSet){
        $intersectArr=array_intersect($UncoveredElements, $SubSet);
        $weight=count($intersectArr);
        if ($weight>$max){
            $max=$weight;
            $s=$SubSet;
        }
    }
    if ($max>0)
        return $s;
    else
        return 0;
}

Any comments and ideas about my solution? For me, it solves my problem, that is what I wanted. But if you suggest any optimization, correction to the code, please fill free.
BTW, thanks a lot for participants of the question for their valuable responses.

Final Update
This Greedy approach for a Set Covering problem does not always guarantee an optimal solution. Consider the following example:
Given: Elements of Main Set = {1, 2, 3, 4, 5, 6}
Now, consider 4 subsets as follows: Subset 1 = {1, 2}, Subset 2 = {3, 4}, Subset 3 = {5, 6}, Subset 4 = {1, 3, 5}.
The Greedy algorithm for the above collection of Subsets gives the Minimum Set Cover as: Minimum Set Cover contains the Subsets= { 1, 2, 3, 4}.
Thus, though the minimum collection of subsets that cover all the elements of the Main set are the first three subsets, we get the solution containing all the 4 Subsets.

like image 835
Bakhtiyor Avatar asked Jun 28 '10 08:06

Bakhtiyor


People also ask

How do you find the minimum set cover?

Assume that every item in X appears in some set, i.e., ⋃j Sj = X. A set cover of X with S is a set I ⊆ {1, 2,...,m} such that ⋃j∈I Sj = X. The solution for MIN-SET-COVER problem is a set cover I of minimum size. That is, a minimum set cover is the smallest set of sets {Si1 ,Si2 ,...,Sik } that covers X.

Is set cover NP hard?

The set-cover is NP and NP-Hard. Therefore, the set-cover is NP-Complete.

What is set Cover and vertex cover?

A vertex cover of an undirected graph is a subset of its vertices such that for every edge (u, v) of the graph, either 'u' or 'v' is in the vertex cover. Although the name is Vertex Cover, the set covers all edges of the given graph.

What is set covering problem with example?

Second, many problems in different industries can be formulated as set covering problems. For example, scheduling machines to perform certain jobs can be thought of as covering the jobs. Picking the optimal location for a cell tower so that it covers the maximum number of customers is another set covering application.


1 Answers

Bakhtiyor, what you say you need for this problem is a little bit contradictory. You said first that you want a minimum set cover, and pointed out yourself that the problem is NP-hard. The algorithm that you posted is the greedy set cover algorithm. This algorithm finds a pretty small set cover, but not necessarily the minimum set cover. Two people posted an algorithm that searches more thoroughly and does find a minimum set cover, and you said in one comment that you don't need an optimal solution.

Do you need an optimal solution or not? Because, finding the minimum set cover is a very different algorithm problem from finding a fairly small set cover. An algorithm for the former has to be written very carefully so that doesn't take ages for a large input. (By I large input I mean, say, 40 subsets.) An algorithm for the latter is easy and you did a good job with your own code. The one thing that I would change is that before your your loop statement "foreach($SubSetArr as $SubSet)", I would randomize the list of subsets. That gives the algorithm some chance of being optimal for many inputs. (But, there are examples where the minimum set cover does not include the largest subset, so for some inputs this particular greedy algorithm will have no chance of finding the minimum, regardless of order.)

If you really do want the minimum set cover, and not just a pretty good set cover, then you should state the largest input that you want the code to solve completely. That is a crucial detail that affects how fancy the algorithm needs to be for your task.


To respond to what some other people have said: First, it is certainly not necessary to search over all collections of subsets if you want the optimal solution. Second, you shouldn't be looking for "the" algorithm for the problem. The problem has many algorithms, some faster than others, some more complicated than others.

Here is one way that you can start with an algorithm that searches over all collections, and accelerate it to make something much better. I won't supply code since I don't think that the questioner wants such a fancy solution, but I can describe the ideas. You can arrange the search as a branched search: Either the best set cover contains the largest subset A or it does not. (It works well to start with the largest set.) If it does, you can remove the elements of A from the list of elements, remove the elements of A from the elements of the other subsets, and solve the remaining subset cover problem. If it does not, you can still remove A from the list of subsets and solve the remaining subset cover problem.

So far, this is just a way to arrange a search over everything. But, there are several important ways to take shortcuts in this recursive algorithm. As you find solutions, you can keep a record of the best one that you have found so far. This sets a threshold that you have to beat. At each stage you can total the sizes of the largest threshold-1 subsets remaining, and see if they have enough elements to cover the remaining elements. If not, you can give up; you won't beat the current threshold.

In addition, if in this recursive algorithm any element is only covered by one subset, you can throw in that subset for sure whether or not it is the largest one. (In fact, it's a good idea to add this step even to the greedy algorithm that Bakhtiyor coded.)

These two accelerations can each prune away huge branches from the search tree, and they work even better in combination.

There is also a clever data structure for this type of problem due to Don Knuth called "Dancing Links". He proposed it for the exact cover problem, which is a little bit different from this one, but you can adapt it to minimum subset cover and all of the ideas above. Dancing links is a mesh of intersecting linked lists that allows you check subsets in and out of the recursive search without having to copy the entire input.

like image 79
Greg Kuperberg Avatar answered Oct 22 '22 07:10

Greg Kuperberg