I have two dataframes each having two columns (for example, x and y). I need to compare the two dataframes and see whether any of the values in x or y or both x and y are similar in the two dataframes.
Use all.equal
function. It does not sort the dataframes. It will simply check each cell in data frame
against the same cell in another one.
You can also use identical()
function.
Without an example I cannot be certain I understand what you want. However, I think you want something like this. If so, there are almost certainly better ways to do the same thing.
a <- matrix(c(1,2,
3,4,
5,6,
7,8), nrow=4, byrow=T, dimnames = list(NULL, c("x","y")))
b <- matrix(c(1,2,
9,4,
9,6,
7,9), nrow=4, byrow=T, dimnames = list(NULL, c("x","y")))
cc <- matrix(c(NA,NA,
NA,NA,
NA,NA,
NA,NA), nrow=4, byrow=T, dimnames = list(NULL, c("x","y")))
for(i in 1:dim(a)[1]) {
for(j in 1:dim(a)[2]) {
if(a[i,j]==b[i,j]) cc[i,j]=a[i,j]
}
}
cc
EDIT: January 8, 2013
The following line will tell you which cells differ between the two matrices:
which(a != b, arr.ind=TRUE)
# row col
# [1,] 2 1
# [2,] 3 1
# [3,] 4 2
If the two matrices, a and b, are identical then:
which(a != b)
# integer(0)
which(a != b, arr.ind=TRUE)
# row col
EDIT January 9, 2012
The following code demonstrates the effect that row names can have on identical
, all.equal
and which
when one of the two data frames is created by subsetting a third data frame. If row names differ between the two data frames being compared then neither identical
nor all.equal
will return TRUE
. However, which
can still be used to compare the columns x
and y
between the two data frames. If row names are set to NULL
for each of the two data frames being compared then both identical
and all.equal
will return TRUE
.
df1 <- read.table(text = "
group x y
1 10 20
1 10 20
1 10 20
1 10 20
2 1 2
2 3 4
2 5 6
2 7 8
", sep = "", header = TRUE)
df2 <- read.table(text = "
group x y
2 1 2
2 3 4
2 5 6
2 7 8
", sep = "", header = TRUE)
# df3 is a subset of df1
df3 <- df1[df1$group==2,]
# rownames differ between df2 and df3 and
# therefore neither 'all.equal' nor 'identical' return TRUE
# even though the i,j cells of df2 and df3 are the same.
# Note that 'which' indicates no i,j cells differ between df2 and df3
df2
df3
all.equal(df2, df3)
identical(df2, df3)
which(df2 != df3)
# set row names to NULL in both data sets and
# now both 'all.equal' and 'identical' return TRUE.
# Note that 'which' still indicates no i,j cells differ between df2 and df3
rownames(df2) <- NULL
rownames(df3) <- NULL
df2
df3
all.equal(df2, df3)
identical(df2, df3)
which(df2 != df3)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With