Vectorize comparison of a row vector with every row of a dataframe in R?

Question

Suppose I have a data frame that comes from reading in the following file Foo.csv

A,B,C
1,2,3
2,2,4
1,7,3

I would like to count the number of matching elements between the first row and subsequent rows. For example, the first row matches with the second row in one position, and matches with the third row in two positions. Here is some code that will achieve the desired effect.

foo = read.csv("Foo.csv")                      

numDiffs = rep(0,dim(foo)[1])                  
for (i in 2:dim(foo)[1]) {                     
   numDiffs[i] = sum(foo[i,] == foo[1,])       
}                                              
print(numDiffs)

My question is, can this be vectorized to kill the loop and possibly reduce the running time? My first attempt is below, but it leaves an error because == is not defined for this type of comparison.

colSums(foo == foo[1,])

TheComeOnMan · Accepted Answer

> rowSums(sapply(foo, function(x) c(0,x[1] == x[2:nrow(foo)])))
[1] 0 1 2

nacnudus · Answer

Or using the automatic recycling of matrix comparisons:

bar <- as.matrix(foo)
c(0, rowSums(t(t(bar[-1, ]) == bar[1, ])))
# [1] 0 1 2

t() is there twice because the recycling is column- rather than row-wise.

Vectorize comparison of a row vector with every row of a dataframe in R?

Tags:

r

vector

merlin2011

2 Answers

TheComeOnMan

nacnudus

Recent Activity

Donate For Us

Vectorize comparison of a row vector with every row of a dataframe in R?

Tags:

r

vector

merlin2011

2 Answers

TheComeOnMan

nacnudus

Related questions

Recent Activity

Donate For Us