I have two dataframes each having two columns (for example, x and y). I need to compare the two dataframes and see whether any of the values in x or y or both x and y are similar in the two dataframes.
Use all.equal function. It does not sort the dataframes. It will simply check each cell in data frame against the same cell in another one.
You can also use identical() function.
Without an example I cannot be certain I understand what you want. However, I think you want something like this. If so, there are almost certainly better ways to do the same thing.
a <- matrix(c(1,2,
3,4,
5,6,
7,8), nrow=4, byrow=T, dimnames = list(NULL, c("x","y")))
b <- matrix(c(1,2,
9,4,
9,6,
7,9), nrow=4, byrow=T, dimnames = list(NULL, c("x","y")))
cc <- matrix(c(NA,NA,
NA,NA,
NA,NA,
NA,NA), nrow=4, byrow=T, dimnames = list(NULL, c("x","y")))
for(i in 1:dim(a)[1]) {
for(j in 1:dim(a)[2]) {
if(a[i,j]==b[i,j]) cc[i,j]=a[i,j]
}
}
cc
EDIT: January 8, 2013
The following line will tell you which cells differ between the two matrices:
which(a != b, arr.ind=TRUE)
# row col
# [1,] 2 1
# [2,] 3 1
# [3,] 4 2
If the two matrices, a and b, are identical then:
which(a != b)
# integer(0)
which(a != b, arr.ind=TRUE)
# row col
EDIT January 9, 2012
The following code demonstrates the effect that row names can have on identical, all.equal and which when one of the two data frames is created by subsetting a third data frame. If row names differ between the two data frames being compared then neither identical nor all.equal will return TRUE. However, which can still be used to compare the columns x and y between the two data frames. If row names are set to NULL for each of the two data frames being compared then both identical and all.equal will return TRUE.
df1 <- read.table(text = "
group x y
1 10 20
1 10 20
1 10 20
1 10 20
2 1 2
2 3 4
2 5 6
2 7 8
", sep = "", header = TRUE)
df2 <- read.table(text = "
group x y
2 1 2
2 3 4
2 5 6
2 7 8
", sep = "", header = TRUE)
# df3 is a subset of df1
df3 <- df1[df1$group==2,]
# rownames differ between df2 and df3 and
# therefore neither 'all.equal' nor 'identical' return TRUE
# even though the i,j cells of df2 and df3 are the same.
# Note that 'which' indicates no i,j cells differ between df2 and df3
df2
df3
all.equal(df2, df3)
identical(df2, df3)
which(df2 != df3)
# set row names to NULL in both data sets and
# now both 'all.equal' and 'identical' return TRUE.
# Note that 'which' still indicates no i,j cells differ between df2 and df3
rownames(df2) <- NULL
rownames(df3) <- NULL
df2
df3
all.equal(df2, df3)
identical(df2, df3)
which(df2 != df3)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With