Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Variation on set cover problem in R / C++

Given a universe of elements U = {1, 2, 3,...,n} and a number of sets in this universe {S1, S2,...,Sm}, what is the smallest set we can create that will cover at least one element in each of the m sets?

For example, given the following elements U = {1,2,3,4} and sets S = {{4,3,1},{3,1},{4}}, the following sets will cover at least one element from each set: {1,4} or {3,4} so the minimum sized set required here is 2.

Any thoughts on how this can be scaled up to solve the problem for m=100 or m=1000 sets? Or thoughts on how to code this up in R or C++?

The sample data, from above, using R's library(sets).

s1 <- set(4, 3, 1)
s2 <- set(3, 1)
s3 <- set(4)
s <- set(s1, s2, s3)

Cheers

like image 692
jedfrancis Avatar asked Jul 19 '11 16:07

jedfrancis


People also ask

What is set covering problem with example?

Second, many problems in different industries can be formulated as set covering problems. For example, scheduling machines to perform certain jobs can be thought of as covering the jobs. Picking the optimal location for a cell tower so that it covers the maximum number of customers is another set covering application.

Is set cover NP hard?

The set-cover is NP and NP-Hard. Therefore, the set-cover is NP-Complete.

What is the time complexity of set cover?

So, the complexity of this approach should be O(2^n). However, Wikipedia says 'the complexity of Set Cover Problem is m^n where m is the size of the universe and n is the number of sets in the collection'.

What are set covering constraints?

A worker may or may not perform a task, depending on skills. In addition, each worker can be hired for a cost that also depends on his qualifications. The problem consists of selecting a set of workers to perform all the tasks, while minimizing the cost. This is known as a set-covering problem.


2 Answers

This is the hitting set problem, which is basically set cover with the roles of elements and sets interchanged. Letting A = {4, 3, 1} and B = {3, 1} and C = {4}, the element-set containment relation is

  A B C
1 + + -
2 - - -
3 + + -
4 + - +

so you basically want to solve the problem of covering {A, B, C} with sets 1 = {A, B} and 2 = {} and 3 = {A, B} and 4 = {A, C}.

Probably the easiest way to solve nontrivial instances of set cover in practice is to find an integer programming package with an interface to R or C++. Your example would be rendered as the following integer program, in LP format.

Minimize
    obj: x1 + x2 + x3 + x4
Subject To
    A: x1 + x3 + x4 >= 1
    B: x1 + x3 >= 1
    C: x4 >= 1
Binary
    x1 x2 x3 x4
End
like image 195
bar Avatar answered Sep 20 '22 21:09

bar


At first I misunderstood the complexity of the problem and came up with a function that finds a set that covers the m sets - but I then realized that it isn't necessarily the smallest one:

cover <- function(sets, elements = NULL) {
  if (is.null(elements)) {
    # Build the union of all sets
    su <- integer() 
    for(si in sets) su <- union(su, si)
  } else {
    su <- elements
  }

  s <- su
  for(i in seq_along(s)) {
    # create set candidate with one element removed
    sc <- s[-i] 

    ok <- TRUE
    for(si in sets) {
      if (!any(match(si, sc, nomatch=0L))) {
        ok <- FALSE
        break
      }
    }

    if (ok) {
      s <- sc
    }
  }

  # The resulting set
  s
}

sets <- list(s1=c(1,3,4), s2=c(1,3), s3=c(4))
> cover(sets) # [1] 3 4

Then we can time it:

n <- 100  # number of elements
m <- 1000 # number of sets
sets <- lapply(seq_len(m), function(i) sample.int(n, runif(1, 1, n)))
system.time( s <- cover(sets) ) # 0.53 seconds

Not too bad, but still not the smallest one.

The obvious solution: generate all permutations of elements and pass is to the cover function and keep the smallest result. This will take close to "forever".

Another approach is to generate a limited number of random permutations - this way you get an approximation of the smallest set.

ns <- 10 # number of samples
elements <- seq_len(n)
smin <- sets
for(i in seq_len(ns)) {
   s <- cover(sets, sample(elements))
   if (length(s) < length(smin)) {
     smin <- s
   }
}
length(smin) # approximate smallest length 
like image 31
Tommy Avatar answered Sep 20 '22 21:09

Tommy