Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identity of rows in r data frame

Tags:

r

I want to compare two rows of a data frame for identity. I thougth the identical() function would be appropriate for this task, however it doesn't work as expected. Here is a minimal example:

x=factor(c("x","x"),levels=c("x","y"))
y=factor(c("y","y"),levels=c("x","y"))
df=data.frame(x,y)
df
  x y
1 x y
2 x y

identical(df[1,],df[2,])
[1] FALSE
> df[1,]==df[2,]
     x    y

1 TRUE TRUE

Can anyone explain me why identical() returns FALSE?

Thanks, Thomas

like image 820
user2481662 Avatar asked Jun 13 '13 09:06

user2481662


2 Answers

identical(df[1,],df[2,])
#[1] FALSE
all.equal(df[1,],df[2,])
#[1] "Attributes: < Component 2: Mean relative difference: 1 >"

all.equal(df[1,],df[2,],check.attributes = FALSE)
#[1] TRUE

anyDuplicated(df[1:2,])>0
#[1] TRUE
like image 145
Roland Avatar answered Sep 30 '22 12:09

Roland


try this function

all.equal(df[1,],df[2,])
[1] "Attributes: < Component 2: Mean relative difference: 1 >"

( in general comparing factors may give 'unexpected' results...) In this case identity, trying to match everything, finds different row.names, you can see that from dput:

> dput(df[1,])
structure(list(x = structure(1L, .Label = c("x", "y"), class = "factor"), 
    y = structure(2L, .Label = c("x", "y"), class = "factor")), .Names = c("x", 
"y"), row.names = 1L, class = "data.frame")
> dput(df[2,])
structure(list(x = structure(1L, .Label = c("x", "y"), class = "factor"), 
    y = structure(2L, .Label = c("x", "y"), class = "factor")), .Names = c("x", 
"y"), row.names = 2L, class = "data.frame")

In this example a simple == works:

> df[1,]==df[2,]
     x    y
1 TRUE TRUE
like image 40
Michele Avatar answered Sep 30 '22 14:09

Michele