Given the following: <pre class="prettyprint"><code>a <- c(1,2,3) b <- c(1,2,3) c <- c(4,5,6) A <- cbind(a,b,c) </code></pre> I want to find which columns in A are equal to for example my vector a. My first attempt would be: <pre class="prettyprint"><code>> which(a==A) [1] 1 2 3 4 5 6 </code></pre> Which did not do that. (Too be honest I don't even understand what that did) Second attempt was: <pre class="prettyprint"><code>a==A a b c [1,] TRUE TRUE FALSE [2,] TRUE TRUE FALSE [3,] TRUE TRUE FALSE </code></pre> which definitely is a step in the right direction but it seems extended into a matrix. What I would have preferred is something like just one of the rows. How do I compare a vector to columns and how do I find columns in a matrix that are equal to a vector?

Use <code>identical</code>. That is R's "scalar" comparison operator; it returns a single logical value, not a vector. <pre class="prettyprint"><code>apply(A, 2, identical, a) # a b c # TRUE TRUE FALSE </code></pre> If <code>A</code> is a data frame in your real case, you're better off using <code>sapply</code> or <code>vapply</code> because <code>apply</code> coerces it's input to a matrix. <pre class="prettyprint"><code>d <- c("a", "b", "c") B <- data.frame(a, b, c, d) apply(B, 2, identical, a) # incorrect! # a b c d # FALSE FALSE FALSE FALSE sapply(B, identical, a) # correct # a b c d # TRUE TRUE FALSE FALSE </code></pre> But note that <code>data.frame</code> coerces character inputs to factors unless you ask otherwise: <pre class="prettyprint"><code>sapply(B, identical, d) # incorrect # a b c d # FALSE FALSE FALSE FALSE C <- data.frame(a, b, c, d, stringsAsFactors = FALSE) sapply(C, identical, d) # correct # a b c d # FALSE FALSE FALSE TRUE </code></pre> Identical is also considerably faster than using <code>all</code> + <code>==</code>: <pre class="prettyprint"><code>library(microbenchmark) a <- 1:1000 b <- c(1:999, 1001) microbenchmark( all(a == b), identical(a, b)) # Unit: microseconds # expr min lq median uq max # 1 all(a == b) 8.053 8.149 8.2195 8.3295 17.355 # 2 identical(a, b) 1.082 1.182 1.2675 1.3435 3.635 </code></pre>

How do I find equal columns in R?

Tags:

r

equivalence

Given the following:

a <- c(1,2,3)
b <- c(1,2,3)
c <- c(4,5,6)
A <- cbind(a,b,c)

I want to find which columns in A are equal to for example my vector a.

My first attempt would be:

> which(a==A)
[1] 1 2 3 4 5 6

Which did not do that. (Too be honest I don't even understand what that did) Second attempt was:

a==A
        a    b     c
[1,] TRUE TRUE FALSE
[2,] TRUE TRUE FALSE
[3,] TRUE TRUE FALSE

which definitely is a step in the right direction but it seems extended into a matrix. What I would have preferred is something like just one of the rows. How do I compare a vector to columns and how do I find columns in a matrix that are equal to a vector?

550

asked Oct 19 '12 08:10

jonalv

1 Answers

Use identical. That is R's "scalar" comparison operator; it returns a single logical value, not a vector.

apply(A, 2, identical, a)
#    a     b     c 
# TRUE  TRUE FALSE

If A is a data frame in your real case, you're better off using sapply or vapply because apply coerces it's input to a matrix.

d <- c("a", "b", "c")
B <- data.frame(a, b, c, d)

apply(B, 2, identical, a) # incorrect!
#     a     b     c     d 
# FALSE FALSE FALSE FALSE 

sapply(B, identical, a) # correct
#    a     b     c     d 
# TRUE  TRUE FALSE FALSE

But note that data.frame coerces character inputs to factors unless you ask otherwise:

sapply(B, identical, d) # incorrect
#     a     b     c     d 
# FALSE FALSE FALSE FALSE 

C <- data.frame(a, b, c, d, stringsAsFactors = FALSE)
sapply(C, identical, d) # correct
#     a     b     c     d 
# FALSE FALSE FALSE  TRUE

Identical is also considerably faster than using all + ==:

library(microbenchmark)

a <- 1:1000
b <- c(1:999, 1001)

microbenchmark(
  all(a == b), 
  identical(a, b))
# Unit: microseconds
#              expr   min    lq median     uq    max
# 1     all(a == b) 8.053 8.149 8.2195 8.3295 17.355
# 2 identical(a, b) 1.082 1.182 1.2675 1.3435  3.635

123

answered Sep 30 '22 05:09

hadley

Related questions
                            
                                gcc: error: libgomp.spec: No such file or directory with Amazon Linux 2017.09.1
                            
                                R markdown to PDF - Printing console output
                            
                                Extraction of POSIXlt component runs fine in R 3.4.4, but errors in R 3.5.0. Why?
                            
                                Create ggplot2 function and specify arguments as variables in data as per ggplot2 standard functionality
                            
                                Automatically extracting strings with mismatched spellings from a column and replacing them in R [closed]
                            
                                Convert multiple columns of a data frame from string to numeric in R
                            
                                How to remove certain items from a vector?
                            
                                Translating dplyr to data.table
                            
                                select columns that do NOT start with a string using dplyr in R
                            
                                gunzip a file stream in R?
                            
                                How to get multiple years Y-axis data from a single file on the same plot?
                            
                                Recoding Numeric Vector R
                            
                                How to count how many times a value occurs after another
                            
                                how to save print(i/j) to an output file?
                            
                                "with" function behaviour
                            
                                Calculating the mean of values in tables using formulae [R]
                            
                                How to load big csv file with mixed-type columns using the bigmemory package
                            
                                Slower ddply when .parallel=TRUE on Mac OS X Version 10.6.7
                            
                                Why is R slow on this random permutation function?
                            
                                ggplot2: Quick Heatmap Plotting, reshape?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With