Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

as(x, 'double') and as.double(x) are inconsistent

Tags:

r

x <- 1:10
str(x)
# int [1:10] 1 2 3 4 5 6 7 8 9 10
str(as.double(x))
# num [1:10] 1 2 3 4 5 6 7 8 9 10 
str(as(x, 'double'))
# int [1:10] 1 2 3 4 5 6 7 8 9 10

I'd be surprised if there was a bug in R with something so basic as type conversion. Is there a reason for this inconsistency?

like image 502
Matthew Plourde Avatar asked Dec 04 '15 16:12

Matthew Plourde


2 Answers

as is for coercing to a new class, and double technically isn't a class but rather a storage.mode.

y <- x
storage.mode(y) <- "double"
identical(x,y)
[1] FALSE
> identical(as.double(x),y)
[1] TRUE

The argument "double" is handled as a special case by as and will attempt to coerce to the class numeric, which the class integer already inherits, therefore there is no change.

is.numeric(x)
[1] TRUE

Not so fast...

While the above made sense, there is some further confusion. From ?double:

It is a historical anomaly that R has two names for its floating-point vectors, double and numeric (and formerly had real).

double is the name of the type. numeric is the name of the mode and also of the implicit class. As an S4 formal class, use "numeric".

The potential confusion is that R has used mode "numeric" to mean ‘double or integer’, which conflicts with the S4 usage. Thus is.numeric tests the mode, not the class, but as.numeric (which is identical to as.double) coerces to the class.

Therefore as should really change x according to the documentation... I will investigate further.

The plot is thicker than whipped cream and cornflour soup...

Well, if you debug as, you find out that what eventually happens is that the following method gets created rather than using the c("ANY","numeric") signature for the coerce generic which would call as.numeric:

function (from, strict = TRUE) 
if (strict) {
    class(from) <- "numeric"
    from
} else from

So actually, class<- gets called on x and this eventually means R_set_class is called from coerce.c. I believe the following part of the function determines the behaviour:

...
else if(!strcmp("numeric", valueString)) {
    setAttrib(obj, R_ClassSymbol, R_NilValue);
    if(IS_S4_OBJECT(obj)) /* NULL class is only valid for S3 objects */
      do_unsetS4(obj, value);
    switch(TYPEOF(obj)) {
    case INTSXP: case REALSXP: break;
    default: PROTECT(obj = coerceVector(obj, REALSXP));
    nProtect++;
    }
...

Note the switch statement: it breaks out without doing coercion in the case of integers and real values.

Bug or not?

Whether or not this is a bug depends on your point of view. Integers are numeric in one sense as confirmed by is.numeric(x) returning TRUE, but strictly speaking they are not a numeric class. On the other hand, since integers get promoted to double automatically on overflow, one may view them conceptually as the same. There are two major differences: i) Integers require less storage space - this may be significant for larger vectors, and, ii) when interacting with external code that has greater type discipline conversion costs may come into play.

like image 50
James Avatar answered Nov 10 '22 16:11

James


as(x,"double"): Methods are pre-defined for coercing any object to one of the basic datatypes. For example, as(x, "numeric") uses the existing as.numeric function. These built-in methods can be listed by showMethods("coerce"). These functions manage the relations that allow coercing an object to a given class.

as.double(x): as.double is a generic function. It is identical to as.numeric. Methods should return an object of base type "double". as.double creates, coerces to or test for a double-precision vector.

like image 41
Pontios Avatar answered Nov 10 '22 17:11

Pontios