Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Possible bug in R all.equal

Tags:

r

I have faced some strange behavior in the R's all.equal function. Basically, I create two same data.frames differently and then call the all.equal function (checking data and attributes as well).

The code to reproduce the behavior is as follows:

var.a <- data.frame(cbind(as.integer(c(1,5,9)), as.integer(c(1,5,9))))
colnames(var.a) <- c("C1", "C2")
rownames(var.a) <- c("1","5","9")

var.b <- data.frame(matrix(NA, nrow = 10, ncol = 2))
var.b[, 1] <- 1:10
var.b[, 2] <- 1:10
colnames(var.b) <- c("C1", "C2")
var.b <- var.b[seq(1, nrow(var.b), 4), ]

all.equal(var.a, var.b)

Is this a bug or am I just missing something? I did quite some debugging of the all.equall function and it appears the problem is the the rownames of the data.frames (once them being a character the other time a numeric vector). The response of the all.equall function:

[1] "Attributes: < Component 2: Modes: character, numeric >"
[2] "Attributes: < Component 2: target is character, current is numeric >"

However,

typeof(rownames(var.a)) == typeof(rownames(var.b))

returns TRUE, which confuses me.

P.S.: The structure of the objects seems the same:

> str(var.a)
'data.frame':   3 obs. of  2 variables:
$ C1: int  1 5 9
$ C2: int  1 5 9
> str(var.b)
'data.frame':   3 obs. of  2 variables:
$ C1: int  1 5 9
$ C2: int  1 5 9

I would appreciate if someone could shed some light on this.

like image 685
Igor Avatar asked Dec 11 '22 22:12

Igor


1 Answers

(I'm not exactly clear what bug you are thinking you have found. The data frames were not created the same way.) There are two differences in the structures of var.a and var.b: The mode of the elements in the columns: numeric in 'var.a' and integer in 'var.b'; and the mode of the rownames: integer for 'var.a' and character in 'var.b':

> dput(var.b)
structure(list(C1 = c(1L, 5L, 9L), C2 = c(1L, 5L, 9L)), .Names = c("C1", 
"C2"), row.names = c(1L, 5L, 9L), class = "data.frame")
> dput(var.a)
structure(list(C1 = c(1, 5, 9), C2 = c(1, 5, 9)), .Names = c("C1", 
"C2"), row.names = c("1", "5", "9"), class = "data.frame")

> mode(attr(var.b, "row.names"))
[1] "numeric"
> storage.mode(attr(var.b, "row.names"))
[1] "integer"
> mode(attr(var.a, "row.names"))
[1] "character"

Added note: If you wanted to check for numerical equality you should use the 'check.attributes' switch:

> all.equal(var.a, var.b, check.attributes=FALSE)
[1] TRUE

If you look at var.b with dput, you can see that the rownames are numeric:

> dput(var.b)
structure(list(C1 = c(1L, 5L, 9L), C2 = c(1L, 5L, 9L)), .Names = c("C1", 
"C2"), row.names = c(1L, 5L, 9L), class = "data.frame")
like image 68
IRTFM Avatar answered Jan 05 '23 10:01

IRTFM