Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - Efficient way to test whether a pair of vectors is disjoint

I want to know if two vectors have any elements in common. I don't care what the elements are, how many common elements there are, or what positions they are at within either vector. I just need a simple, efficient function EIC(vec1, vec2) that returns TRUE if there exists some element in both vec1 and vec2, FALSE if there are no elements common to both. Also we can assume that neither vec1 nor vec2 contain NA, but either may have duplicated values.

I've thought of five ways to do this, but they all seem inefficient:

EIC.1 <- function(vec1, vec2) length(intersect(vec1, vec2)) > 0
# I want a function that will stop when it finds the first 
# common element between the vectors, and return TRUE. The
# intersect function will continue on and check whether there are
# any other common elements.

EIC.2 <- function(vec1, vec2) any(vec1 %in% vec2)

EIC.3 <- function(vec1, vec2) any(!is.na(match(vec1, vec2)))
# the match function goes to the trouble of finding the position
# of all matches; I don't need the position but just want to know
# if any exist

EIC.4 <- function(vec1, vec2) {
      uvec1 <- unique(vec1)
      uvec2 <- unique(vec2)
      length(unique(c(uvec1, uvec2))) < length(uvec1) + length(uvec2)
}

EIC.5 <- function(vec1, vec2) !!anyDuplicated(c(unique(vec1), unique(vec2)))
# per https://stackoverflow.com/questions/5263498/how-to-test-whether-a-vector-contains-repetitive-elements#comment5931428_5263593
# I suspect this is the most efficient of the five, because
# anyDuplicated will stop looking when it comes to the first one,
# but I'm not sure about using !! to coerce to boolean type

I will be using very long vectors (without any NAs, as previously mentioned) and will be running this function millions of times, which is why I am looking for something efficient. Here is some test data:

v1 <- c(9, 8, 75, 62)
v2 <- c(20, 75, 341, 987, 8)
v3 <- c(154, 62, 62, 143, 154, 95)
v4 <- c(12, 62, 12)

EIC <- EIC.1

EIC(v1, v2)
EIC(v1, v3)
EIC(v1, v4)
EIC(v2, v3)
EIC(v2, v4)
EIC(v3, v4)

Correct results are TRUE, TRUE, TRUE, FALSE, FALSE, TRUE.

like image 649
Montgomery Clift Avatar asked Oct 23 '18 04:10

Montgomery Clift


People also ask

How do you check if two vectors are the same in R?

Check if Two Objects are Equal in R Programming – setequal() Function. setequal() function in R Language is used to check if two objects are equal. This function takes two objects like Vectors, dataframes, etc. as arguments and results in TRUE or FALSE, if the Objects are equal or not.

How do you find the difference between two vectors in R?

The difference (A-B) between two vectors in R Programming is equivalent to the elements present in A which are not present in B. The resultant elements are always a subset of A. In case, both sets are non-intersecting, the entire A set is returned.

Which function will compare two sets of vectors to see if the vectors share the same characters?

intersect() function is used to return the common element present in two vectors. Thus, the two vectors are compared, and if a common element exists it is displayed.

How do you find the common element in two vectors in R?

To do this intersect() method is used. It is used to return the common elements from two objects. where, vector is the input data. If there are more than two vectors then we can combine all these vectors into one except one vector.


1 Answers

Not really an answer, just some comments:

  • EIC.1, EIC.2 and EIC.3 all use match() at some point:
  • EIC.1 has some extra overhead so is a bit slower,
intersect <- function (x, y) 
{
    y <- as.vector(y)
    unique(y[match(as.vector(x), y, 0L)])
}
  • EIC.2 use %in% and as such is very close to EIC.3
`%in%` <- function (x, table) match(x, table, nomatch = 0L) > 0L

You could shave a bit of time on some cases with this:

EIC.all <- function(vec1, vec2) !all(is.na(match(vec1, vec2)))

because the negation ! is performed on a scalar instead of a vector of size length(vec1).

What you need is a C/C++ function that does the exact same thing as the match internal function but stops at the first match. You could have a look at the mach5 C function: https://github.com/wch/r-source/blob/d1f8ef492464fd68320be9581bde4b09eadc03d6/src/main/unique.c#L1332

like image 58
Karl Forner Avatar answered Sep 20 '22 16:09

Karl Forner