Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to compare two dataframes? [closed]

Tags:

dataframe

r

I have two dataframes each having two columns (for example, x and y). I need to compare the two dataframes and see whether any of the values in x or y or both x and y are similar in the two dataframes.

like image 993
sathya Avatar asked Jun 11 '12 11:06

sathya


2 Answers

Use all.equal function. It does not sort the dataframes. It will simply check each cell in data frame against the same cell in another one. You can also use identical() function.

like image 171
Max C Avatar answered Oct 31 '22 12:10

Max C


Without an example I cannot be certain I understand what you want. However, I think you want something like this. If so, there are almost certainly better ways to do the same thing.

a <- matrix(c(1,2,
              3,4,
              5,6,
              7,8), nrow=4, byrow=T, dimnames = list(NULL, c("x","y")))

b <- matrix(c(1,2,
              9,4,
              9,6,
              7,9), nrow=4, byrow=T, dimnames = list(NULL, c("x","y")))

cc <- matrix(c(NA,NA,
              NA,NA,
              NA,NA,
              NA,NA), nrow=4, byrow=T, dimnames = list(NULL, c("x","y")))

for(i in 1:dim(a)[1]) {
for(j in 1:dim(a)[2]) {
if(a[i,j]==b[i,j]) cc[i,j]=a[i,j]
}
}

cc

EDIT: January 8, 2013

The following line will tell you which cells differ between the two matrices:

which(a != b, arr.ind=TRUE)

#      row col
# [1,]   2   1
# [2,]   3   1
# [3,]   4   2

If the two matrices, a and b, are identical then:

which(a != b)

# integer(0)

which(a != b, arr.ind=TRUE)

# row col

EDIT January 9, 2012

The following code demonstrates the effect that row names can have on identical, all.equal and which when one of the two data frames is created by subsetting a third data frame. If row names differ between the two data frames being compared then neither identical nor all.equal will return TRUE. However, which can still be used to compare the columns x and y between the two data frames. If row names are set to NULL for each of the two data frames being compared then both identical and all.equal will return TRUE.

df1 <- read.table(text = "
     group  x  y 
       1   10 20
       1   10 20
       1   10 20
       1   10 20
       2    1  2
       2    3  4
       2    5  6
       2    7  8
", sep = "", header = TRUE)

df2 <- read.table(text = "
     group  x  y 
       2    1  2
       2    3  4
       2    5  6
       2    7  8
", sep = "", header = TRUE)

# df3 is a subset of df1

df3 <- df1[df1$group==2,]

# rownames differ between df2 and df3 and
# therefore neither 'all.equal' nor 'identical' return TRUE
# even though the i,j cells of df2 and df3 are the same.
# Note that 'which' indicates no i,j cells differ between df2 and df3 

df2
df3

all.equal(df2, df3)
identical(df2, df3)
which(df2 != df3)

# set row names to NULL in both data sets and
# now both 'all.equal' and 'identical' return TRUE.
# Note that 'which' still indicates no i,j cells differ between df2 and df3

rownames(df2) <- NULL
rownames(df3) <- NULL

df2
df3

all.equal(df2, df3)
identical(df2, df3)
which(df2 != df3)
like image 21
Mark Miller Avatar answered Oct 31 '22 12:10

Mark Miller