Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

best practices for avoiding roundoff gotchas in date manipulation

I am doing some date/time manipulation and experiencing explicable, but unpleasant, round-tripping problems when converting date -> time -> date . I have temporarily overcome this problem by rounding at appropriate points, but I wonder if there are best practices for date handling that would be cleaner. I'm using a mix of base-R and lubridate functions.

tl;dr is there a good, simple way to convert from decimal date (YYYY.fff) to the Date class (and back) without going through POSIXt and incurring round-off (and potentially time-zone) complications??

Start with a few days from 1918, as separate year/month/day columns (not a critical part of my problem, but it's where my pipeline happens to start):

library(lubridate)
dd <- data.frame(year=1918,month=9,day=1:12)

Convert year/month/day -> date -> time:

dd <- transform(dd,
                time=decimal_date(make_date(year, month, day)))

The successive differences in the resulting time vector are not exactly 1 because of roundoff: this is understandable but leads to problems down the road.

table(diff(dd$time)*365)
## 0.999999999985448  1.00000000006844 
##                 9                 2 

Now suppose I convert back to a date: the dates are slightly before or after midnight (off by <1 second in either direction):

d2 <- lubridate::date_decimal(dd$time)
#  [1] "1918-09-01 00:00:00 UTC" "1918-09-02 00:00:00 UTC"
#  [3] "1918-09-03 00:00:00 UTC" "1918-09-03 23:59:59 UTC"
#  [5] "1918-09-04 23:59:59 UTC" "1918-09-05 23:59:59 UTC"
#  [7] "1918-09-07 00:00:00 UTC" "1918-09-08 00:00:00 UTC"
#  [9] "1918-09-09 00:00:00 UTC" "1918-09-09 23:59:59 UTC"
# [11] "1918-09-10 23:59:59 UTC" "1918-09-12 00:00:00 UTC"

If I now want dates (rather than POSIXct objects) I can use as.Date(), but to my dismay as.Date() truncates rather than rounding ...

tt <- as.Date(d2)
## [1] "1918-09-01" "1918-09-02" "1918-09-03" "1918-09-03" "1918-09-04"
## [6] "1918-09-05" "1918-09-07" "1918-09-08" "1918-09-09" "1918-09-09"
##[11] "1918-09-10" "1918-09-12"

So the differences are now 0/1/2 days:

table(diff(tt))
# 0 1 2 
# 2 7 2 

I can fix this by rounding first:

table(diff(as.Date(round(d2))))
## 1 
## 11

but I wonder if there is a better way (e.g. keeping POSIXct out of my pipeline and staying with dates ...

As suggested by this R-help desk article from 2004 by Grothendieck and Petzoldt:

When considering which class to use, always choose the least complex class that will support the application. That is, use Date if possible, otherwise use chron and otherwise use the POSIX classes. Such a strategy will greatly reduce the potential for error and increase the reliability of your application.

The extensive table in this article shows how to translate among Date, chron, and POSIXct, but doesn't include decimal time as one of the candidates ...

like image 583
Ben Bolker Avatar asked Nov 21 '17 21:11

Ben Bolker


2 Answers

It seems like it would be best to avoid converting back from decimal time if at all possible.

When converting from date to decimal date, one also needs to account for time. Since Date does not have a specific time associated with it, decimal_date inherently assumes it to be 00:00:00.

However, if we are concerned only with the date (and not the time), we could assume the time to be anything. Arguably, middle of the day (12:00:00) is as good as the beginning of the day (00:00:00). This would make the conversion back to Date more reliable as we are not at the midnight mark and a few seconds off does not affect the output. One of the ways to do this would be to add 12*60*60/(365*24*60*60) to dd$time

dd$time2 = dd$time + 12*60*60/(365*24*60*60)
data.frame(dd[1:3],
           "00:00:00" = as.Date(date_decimal(dd$time)),
           "12:00:00" = as.Date(date_decimal(dd$time2)),
           check.names = FALSE)
#   year month day        00:00:00        12:00:00
#1  1918     9   1      1918-09-01      1918-09-01
#2  1918     9   2      1918-09-02      1918-09-02
#3  1918     9   3      1918-09-03      1918-09-03
#4  1918     9   4      1918-09-03      1918-09-04
#5  1918     9   5      1918-09-04      1918-09-05
#6  1918     9   6      1918-09-05      1918-09-06
#7  1918     9   7      1918-09-07      1918-09-07
#8  1918     9   8      1918-09-08      1918-09-08
#9  1918     9   9      1918-09-09      1918-09-09
#10 1918     9  10      1918-09-09      1918-09-10
#11 1918     9  11      1918-09-10      1918-09-11
#12 1918     9  12      1918-09-12      1918-09-12

It should be noted, however, that the value of decimal time obtained in this way will be different.

like image 185
d.b Avatar answered Oct 20 '22 16:10

d.b


lubridate::decimal_date() is returning a numeric. If I understand you correctly, the question is how to convert that numeric into Date and have it round appropriately without bouncing through POSIXct.

as.Date(1L, origin = '1970-01-01') shows us that we can provide as.Date with days since some specified origin and convert immediately to the Date type. Knowing this, we can skip the year part entirely and set it as origin. Then we can convert our decimal dates to days:

as.Date((dd$time-trunc(dd$time)) * 365, origin = "1918-01-01").

So, a function like this might do the trick (at least for years without leap days):

date_decimal2 <- function(decimal_date) {
  years <- trunc(decimal_date)
  origins <- paste0(years, "-01-01")
  # c.f. https://stackoverflow.com/questions/14449166/dates-with-lapply-and-sapply
  do.call(c, mapply(as.Date.numeric, x = (decimal_date-years) * 365, origin = origins, SIMPLIFY = FALSE))
}

Side note: I admit I went down a bit of a rabbit hole with trying to move origin around deal with the pre-1970 date. I found that the further origin shifted from the target date, the more weird the results got (and not in ways that seemed to be easily explained by leap days). Since origin is flexible, I decided to target it right on top of the target values. For leap days, seconds, and whatever other weirdness time has in store for us, on your own head be it. =)

like image 36
russellpierce Avatar answered Oct 20 '22 16:10

russellpierce