Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

efficiently working with sets in R

Tags:

performance

r

Background:

I am dealing with a combinatorial problem in R. For a given list of sets I need to generate all pairs per set without producing duplicates.

Example:

initial_list_of_sets <- list()
initial_list_of_sets[[1]] <- c(1,2,3)
initial_list_of_sets[[2]] <- c(2,3,4)
initial_list_of_sets[[3]] <- c(3,2)
initial_list_of_sets[[4]] <- c(5,6,7)
get_pairs(initial_list_of_sets) 
# should return (1 2),(1 3),(2 3),(2 4),(3 4),(5 6),(5 7),(6 7)

Please note that (3 2) is not included in the results, as it is mathematically equal to (2 3).

My (working but inefficient) approach so far:

# checks if sets contain a_set
contains <- function(sets, a_set){
  for (existing in sets) {
    if (setequal(existing, a_set)) {
      return(TRUE)
    }
  }
  return(FALSE)
}

get_pairs <- function(from_sets){
  all_pairs <- list()
  for (a_set in from_sets) {
    # generate all pairs for current set
    pairs <- combn(x = a_set, m = 2, simplify = FALSE)
    for (pair in pairs) {
      # only add new pairs if they are not yet included in all_pairs
      if (!contains(all_pairs, pair)) {
        all_pairs <- c(all_pairs, list(pair))
      }
    }
  }
  return(all_pairs)
}

My question:

As I am dealing with mathematical sets I can't use the %in% operator instead of my contains function, because then (2 3) and (3 2) would be different pairs. However it seems very inefficient to iterate over all existing sets in contains. Is there a better way to implement this function?

like image 502
Fabian Braun Avatar asked May 21 '26 19:05

Fabian Braun


1 Answers

Perhaps you can rewrite your get_pairs function as something like the following:

myFun <- function(inlist) {
  unique(do.call(rbind, lapply(inlist, function(x) t(combn(sort(x), 2)))))
}

Here's a quick time comparison.

n <- 100
set.seed(1)

x <- sample(2:8, n, TRUE)
initial_list_of_sets <- lapply(x, function(y) sample(100, y))

system.time(get_pairs(initial_list_of_sets))
#    user  system elapsed 
#   1.964   0.000   1.959 
system.time(myFun(initial_list_of_sets))
#    user  system elapsed 
#   0.012   0.000   0.014 

If needed, you can split the matrix by rows to get your list.

Eg:

myFun <- function(inlist) {
  temp <- unique(do.call(rbind, lapply(inlist, function(x) t(combn(sort(x), 2)))))
  lapply(1:nrow(temp), function(x) temp[x, ])
}
like image 57
A5C1D2H2I1M1N2O1R2T1 Avatar answered May 23 '26 11:05

A5C1D2H2I1M1N2O1R2T1



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!