Given the following:
a <- c(1,2,3)
b <- c(1,2,3)
c <- c(4,5,6)
A <- cbind(a,b,c)
I want to find which columns in A are equal to for example my vector a.
My first attempt would be:
> which(a==A)
[1] 1 2 3 4 5 6
Which did not do that. (Too be honest I don't even understand what that did) Second attempt was:
a==A
a b c
[1,] TRUE TRUE FALSE
[2,] TRUE TRUE FALSE
[3,] TRUE TRUE FALSE
which definitely is a step in the right direction but it seems extended into a matrix. What I would have preferred is something like just one of the rows. How do I compare a vector to columns and how do I find columns in a matrix that are equal to a vector?
We can compare two columns in R by using ifelse(). This statement is used to check the condition given and return the data accordingly.
You can use the function all_equal from the package dplyr . The function returns TRUE if the two data frames are identical, otherwise a character vector describing the reasons why they are not equal.
The second method to find and remove duplicated columns in R is by using the duplicated() function and the t() function. This method is similar to the previous method. However, instead of creating a list, it transposes the data frame before applying the duplicated() function.
2) There is a specific limit for the length (total number of elements) for any vector, matrix, array, column in a data. frame, or list. This is due to a 32-bit index used under the hood, and is true for 32-bit and 64-bit R. The number is 2^31 - 1.
Use identical
. That is R's "scalar" comparison operator; it returns a single logical value, not a vector.
apply(A, 2, identical, a)
# a b c
# TRUE TRUE FALSE
If A
is a data frame in your real case, you're better off using sapply
or vapply
because apply
coerces it's input to a matrix.
d <- c("a", "b", "c")
B <- data.frame(a, b, c, d)
apply(B, 2, identical, a) # incorrect!
# a b c d
# FALSE FALSE FALSE FALSE
sapply(B, identical, a) # correct
# a b c d
# TRUE TRUE FALSE FALSE
But note that data.frame
coerces character inputs to factors unless you ask otherwise:
sapply(B, identical, d) # incorrect
# a b c d
# FALSE FALSE FALSE FALSE
C <- data.frame(a, b, c, d, stringsAsFactors = FALSE)
sapply(C, identical, d) # correct
# a b c d
# FALSE FALSE FALSE TRUE
Identical is also considerably faster than using all
+ ==
:
library(microbenchmark)
a <- 1:1000
b <- c(1:999, 1001)
microbenchmark(
all(a == b),
identical(a, b))
# Unit: microseconds
# expr min lq median uq max
# 1 all(a == b) 8.053 8.149 8.2195 8.3295 17.355
# 2 identical(a, b) 1.082 1.182 1.2675 1.3435 3.635
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With