I want to compare two rows of a data frame for identity. I thougth the identical() function would be appropriate for this task, however it doesn't work as expected. Here is a minimal example:
x=factor(c("x","x"),levels=c("x","y"))
y=factor(c("y","y"),levels=c("x","y"))
df=data.frame(x,y)
df
x y
1 x y
2 x y
identical(df[1,],df[2,])
[1] FALSE
> df[1,]==df[2,]
x y
1 TRUE TRUE
Can anyone explain me why identical() returns FALSE?
Thanks, Thomas
identical(df[1,],df[2,])
#[1] FALSE
all.equal(df[1,],df[2,])
#[1] "Attributes: < Component 2: Mean relative difference: 1 >"
all.equal(df[1,],df[2,],check.attributes = FALSE)
#[1] TRUE
anyDuplicated(df[1:2,])>0
#[1] TRUE
try this function
all.equal(df[1,],df[2,])
[1] "Attributes: < Component 2: Mean relative difference: 1 >"
( in general comparing factors may give 'unexpected' results...) In this case identity
, trying to match everything, finds different row.names
, you can see that from dput
:
> dput(df[1,])
structure(list(x = structure(1L, .Label = c("x", "y"), class = "factor"),
y = structure(2L, .Label = c("x", "y"), class = "factor")), .Names = c("x",
"y"), row.names = 1L, class = "data.frame")
> dput(df[2,])
structure(list(x = structure(1L, .Label = c("x", "y"), class = "factor"),
y = structure(2L, .Label = c("x", "y"), class = "factor")), .Names = c("x",
"y"), row.names = 2L, class = "data.frame")
In this example a simple ==
works:
> df[1,]==df[2,]
x y
1 TRUE TRUE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With