Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Vectorize comparison of a row vector with every row of a dataframe in R?

Tags:

r

vector

Suppose I have a data frame that comes from reading in the following file Foo.csv

A,B,C
1,2,3
2,2,4
1,7,3

I would like to count the number of matching elements between the first row and subsequent rows. For example, the first row matches with the second row in one position, and matches with the third row in two positions. Here is some code that will achieve the desired effect.

foo = read.csv("Foo.csv")                      

numDiffs = rep(0,dim(foo)[1])                  
for (i in 2:dim(foo)[1]) {                     
   numDiffs[i] = sum(foo[i,] == foo[1,])       
}                                              
print(numDiffs)                                

My question is, can this be vectorized to kill the loop and possibly reduce the running time? My first attempt is below, but it leaves an error because == is not defined for this type of comparison.

colSums(foo == foo[1,])
like image 573
merlin2011 Avatar asked Dec 05 '22 08:12

merlin2011


2 Answers

> rowSums(sapply(foo, function(x) c(0,x[1] == x[2:nrow(foo)])))
[1] 0 1 2
like image 189
TheComeOnMan Avatar answered Apr 30 '23 22:04

TheComeOnMan


Or using the automatic recycling of matrix comparisons:

bar <- as.matrix(foo)
c(0, rowSums(t(t(bar[-1, ]) == bar[1, ])))
# [1] 0 1 2

t() is there twice because the recycling is column- rather than row-wise.

like image 35
nacnudus Avatar answered Apr 30 '23 22:04

nacnudus