I'm trying to use dcast from the latest reshape2 package (1.2.1) to denormalize a data frame (or data.table) where the value.var is a POSIXct type, but in the resulting data frame, the date values have lost their POSIXct class and become numeric.
Do I really have to as.POSIXct() every generated column if I want the values back as POSIXct's, or am I missing something?
x <- c("a","b");
y <- c("c","d");
z <- as.POSIXct(c("2012-01-01 01:01:01","2012-02-02 02:02:02"));
d <- data.frame(x, y, z, stringsAsFactors=FALSE);
str(d);
library(reshape2);
e <- dcast(d, formula = x ~ y, value.var = "z");
str(e);
Result of running above statements (note new columns c and d are numeric epoch seconds instead of POSIXct's):
> x <- c("a","b");
> y <- c("c","d");
> z <- as.POSIXct(c("2012-01-01 01:01:01","2012-02-02 02:02:02"));
> d <- data.frame(x, y, z, stringsAsFactors=FALSE);
> str(d);
'data.frame': 2 obs. of 3 variables:
$ x: chr "a" "b"
$ y: chr "c" "d"
$ z: POSIXct, format: "2012-01-01 01:01:01" "2012-02-02 02:02:02"
> library(reshape2);
> e <- dcast(d, formula = x ~ y, value.var = "z");
> str(e);
'data.frame': 2 obs. of 3 variables:
$ x: chr "a" "b"
$ c: num 1.33e+09 NA
$ d: num NA 1.33e+09
Doing debug(dcast)
and debug(as.data.frame.matrix)
, then stepping through the calculations launched by your dcast()
call will reveal that these lines in as.data.frame.matrix()
are at fault:
if (mode(x) == "character" && stringsAsFactors) {
for (i in ic) value[[i]] <- as.factor(x[, i])
}
else {
for (i in ic) value[[i]] <- as.vector(x[, i])
}
The up-to-then POSIXct object has mode "numeric"
, so evaluation follows the second branch, which converts the results to numeric.
If you use dcast()
, it looks like you will need to post-process results, which shouldn't be too hard if you have the correct origin
. Something like this (which doesn't quite get the origin
right) should do the trick:
e[-1] <- lapply(e[-1], as.POSIXct, origin="1960-01-01")
FWIW, base R's reshape()
leaves POSIXct values as they are but will require you to edit the names of the resulting columns...
reshape(d, idvar="x", timevar="y", direction="wide")
# x z.c z.d
# 1 a 2012-01-01 01:01:01 <NA>
# 2 b <NA> 2012-02-02 02:02:02
Pre- and/or post-processing for dates integrity when casting/widening a dataset can be very cumbersome.
In that respect, unless the reshaping you need is complicated, pivot_wider() from package tidyr is respectful of dates objects -- no conversion along the way. In addition, it gives a lot more control over the casting/widening process, thus avoiding post-processing steps (https://tidyr.tidyverse.org/reference/pivot_wider.html).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With