Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

POSIXct values become numeric in reshape2 dcast

Tags:

r

reshape2

I'm trying to use dcast from the latest reshape2 package (1.2.1) to denormalize a data frame (or data.table) where the value.var is a POSIXct type, but in the resulting data frame, the date values have lost their POSIXct class and become numeric.

Do I really have to as.POSIXct() every generated column if I want the values back as POSIXct's, or am I missing something?

x <- c("a","b");
y <- c("c","d");
z <- as.POSIXct(c("2012-01-01 01:01:01","2012-02-02 02:02:02"));
d <- data.frame(x, y, z, stringsAsFactors=FALSE);
str(d);
library(reshape2);
e <- dcast(d, formula = x ~ y, value.var = "z");
str(e);

Result of running above statements (note new columns c and d are numeric epoch seconds instead of POSIXct's):

> x <- c("a","b");
> y <- c("c","d");
> z <- as.POSIXct(c("2012-01-01 01:01:01","2012-02-02 02:02:02"));
> d <- data.frame(x, y, z, stringsAsFactors=FALSE);
> str(d);
'data.frame':   2 obs. of  3 variables:
 $ x: chr  "a" "b"
 $ y: chr  "c" "d"
 $ z: POSIXct, format: "2012-01-01 01:01:01" "2012-02-02 02:02:02"
> library(reshape2);
> e <- dcast(d, formula = x ~ y, value.var = "z");
> str(e);
'data.frame':   2 obs. of  3 variables:
 $ x: chr  "a" "b"
 $ c: num  1.33e+09 NA
 $ d: num  NA 1.33e+09
like image 390
gkaupas Avatar asked Sep 05 '12 21:09

gkaupas


2 Answers

Doing debug(dcast) and debug(as.data.frame.matrix), then stepping through the calculations launched by your dcast() call will reveal that these lines in as.data.frame.matrix() are at fault:

if (mode(x) == "character" && stringsAsFactors) {
    for (i in ic) value[[i]] <- as.factor(x[, i])
}
else {
    for (i in ic) value[[i]] <- as.vector(x[, i])
}

The up-to-then POSIXct object has mode "numeric", so evaluation follows the second branch, which converts the results to numeric.

If you use dcast(), it looks like you will need to post-process results, which shouldn't be too hard if you have the correct origin. Something like this (which doesn't quite get the origin right) should do the trick:

e[-1] <- lapply(e[-1], as.POSIXct, origin="1960-01-01")

FWIW, base R's reshape() leaves POSIXct values as they are but will require you to edit the names of the resulting columns...

reshape(d, idvar="x", timevar="y",  direction="wide")
#   x                 z.c                 z.d
# 1 a 2012-01-01 01:01:01                <NA>
# 2 b                <NA> 2012-02-02 02:02:02
like image 86
Josh O'Brien Avatar answered Nov 19 '22 03:11

Josh O'Brien


Pre- and/or post-processing for dates integrity when casting/widening a dataset can be very cumbersome.

In that respect, unless the reshaping you need is complicated, pivot_wider() from package tidyr is respectful of dates objects -- no conversion along the way. In addition, it gives a lot more control over the casting/widening process, thus avoiding post-processing steps (https://tidyr.tidyverse.org/reference/pivot_wider.html).

like image 1
Madpentiste Avatar answered Nov 19 '22 02:11

Madpentiste