Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does as.character() return an integer on a list of dates?

Tags:

string

date

r

I was surprised to observe the following behavior in R:

as.character(c(Sys.Date()))
#> [1] "2018-02-05"

as.character(list(Sys.Date()))
#> [1] "17567"

Why does this happen? That is, clearly the "17567" is the result of as.integer(Sys.Date), but I do not follow the logic for why as.character(list(Sys.Date())) should wind up invoking as.integer().

(Usually strings being treated as integers can be blamed on not setting options(stringsAsFactors=FALSE), but that doesn't appear to be the case here.)

EDIT: As Josh observes, this is due to the underlying behavior of as.vector, but I do not find that any more intuitive:

as.vector(Sys.Date())
#> 17567
as.vector(Sys.Date(), "character")
#> "17567"

Why? (Yes, I believe dates are stored as integers in the lower-level internals, but this coercion to a literal integer in this circumstance without a warning seems surprising to me).

Also this manifests in more subtle ways:

tbl <- tibble:::as_data_frame(list(col1 = list(Sys.Date(), "stuff")))
df <- as.data.frame(tbl)
df
#>    col1
#> 1 17567
#> 2 stuff

df[1, 1]
#> [[1]]
#> [1] "2018-02-05"

Note that the print method for data.frame is showing the date as an integer, when in fact it is a list column and the date is still the date.

It's not clear what is going on with the print method in this case, and why it shows such a misleading representation of the data.

EDIT:

Other examples where Date class surprisingly falls off, exposing underlying numeric base type:

vapply(list(Sys.Date()), I, Sys.Date())
vapply(list(Sys.Date()), lubridate::as_date, Sys.Date())

and my favorite so far:

unlist(list(Sys.Date()))

It appears that vector operations with Date (and POSIX objects) are fragile; one should focus on the mode / typeof and not class to anticipate how the vector will behave.

like image 788
cboettig Avatar asked Feb 05 '18 21:02

cboettig


1 Answers

The issue ultimately has to do with the behavior of the function as.vector().

When you apply as.character() to a list, it sees an object of class "list" (not one of class "Date"). Since there is no as.character() method for lists, the default method as.character.default gets dispatched. Its does the following:

as.character.default
# function (x, ...) 
# .Internal(as.vector(x, "character"))
# <bytecode: 0x0000000006793e88>
# <environment: namespace:base>

As you can see, it first prepares the data object by coercing it to a vector. Running as.vector() directly on a list of Date objects shows, in turn, that it is what is producing the coercion to integer and then to character.

as.vector(list(Sys.Date()), "character")
# [1] "17567"

As Carl points out, the explanation above, even if accurate, is not really satisfying. A more complete answer requires looking at what happens under the hood, in the C code executed by the call to .Internal(as.vector(x, "character")). All of the relevant C code is in the source file coerce.c.

First up is do_asvector() which calls ascommon() which calls coerceVector() which calls coerceVectorList() and then, finally, coerceToString(). coerceToString() examines the "typeof" the element it is processing, and in our case, seeing that it is a "REAL" switches to this code block:

case REALSXP:
PrintDefaults();
savedigits = R_print.digits; R_print.digits = DBL_DIG;/* MAX precision */
for (i = 0; i < n; i++) {
//  if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt();
    SET_STRING_ELT(ans, i, StringFromReal(REAL(v)[i], &warn));
}
R_print.digits = savedigits;
break;

And why does it use the block for objects of with a typeof REALSXP? Because that's the storage mode of R Date objects (as can be seen by doing mode(Sys.Date()) or typeof(Sys.Date())).


The take-home is this: In the chain of events described above, the elements of the list are not somehow caught and treated as a "Date" objects while in the realm of R function calls and method dispatch. Instead, they get passed along as a "list" (aka VECSXP) to a series of C functions. And at that point, it's kind of too late, as the C functions that process that list know nothing about the "Date" class of its elements. In particular, the function that ultimately does the conversion to character, coerceToCharacter() only sees the elements' storage mode, which is REAL/numeric/double, and processes them as if that was all that they were.

like image 121
Josh O'Brien Avatar answered Sep 19 '22 17:09

Josh O'Brien