Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I find equal columns in R?

Tags:

r

equivalence

Given the following:

a <- c(1,2,3)
b <- c(1,2,3)
c <- c(4,5,6)
A <- cbind(a,b,c)

I want to find which columns in A are equal to for example my vector a.

My first attempt would be:

> which(a==A)
[1] 1 2 3 4 5 6

Which did not do that. (Too be honest I don't even understand what that did) Second attempt was:

a==A
        a    b     c
[1,] TRUE TRUE FALSE
[2,] TRUE TRUE FALSE
[3,] TRUE TRUE FALSE

which definitely is a step in the right direction but it seems extended into a matrix. What I would have preferred is something like just one of the rows. How do I compare a vector to columns and how do I find columns in a matrix that are equal to a vector?

like image 550
jonalv Avatar asked Oct 19 '12 08:10

jonalv


People also ask

How do I check if two columns are equal in R?

We can compare two columns in R by using ifelse(). This statement is used to check the condition given and return the data accordingly.

How do you check if all columns are equal in R?

You can use the function all_equal from the package dplyr . The function returns TRUE if the two data frames are identical, otherwise a character vector describing the reasons why they are not equal.

How do I find duplicate columns in R?

The second method to find and remove duplicated columns in R is by using the duplicated() function and the t() function. This method is similar to the previous method. However, instead of creating a list, it transposes the data frame before applying the duplicated() function.

Does R have a column limit?

2) There is a specific limit for the length (total number of elements) for any vector, matrix, array, column in a data. frame, or list. This is due to a 32-bit index used under the hood, and is true for 32-bit and 64-bit R. The number is 2^31 - 1.


1 Answers

Use identical. That is R's "scalar" comparison operator; it returns a single logical value, not a vector.

apply(A, 2, identical, a)
#    a     b     c 
# TRUE  TRUE FALSE 

If A is a data frame in your real case, you're better off using sapply or vapply because apply coerces it's input to a matrix.

d <- c("a", "b", "c")
B <- data.frame(a, b, c, d)

apply(B, 2, identical, a) # incorrect!
#     a     b     c     d 
# FALSE FALSE FALSE FALSE 

sapply(B, identical, a) # correct
#    a     b     c     d 
# TRUE  TRUE FALSE FALSE

But note that data.frame coerces character inputs to factors unless you ask otherwise:

sapply(B, identical, d) # incorrect
#     a     b     c     d 
# FALSE FALSE FALSE FALSE 

C <- data.frame(a, b, c, d, stringsAsFactors = FALSE)
sapply(C, identical, d) # correct
#     a     b     c     d 
# FALSE FALSE FALSE  TRUE 

Identical is also considerably faster than using all + ==:

library(microbenchmark)

a <- 1:1000
b <- c(1:999, 1001)

microbenchmark(
  all(a == b), 
  identical(a, b))
# Unit: microseconds
#              expr   min    lq median     uq    max
# 1     all(a == b) 8.053 8.149 8.2195 8.3295 17.355
# 2 identical(a, b) 1.082 1.182 1.2675 1.3435  3.635
like image 123
hadley Avatar answered Sep 30 '22 05:09

hadley