Why are these two operations different?
library(lubridate)
library(magrittr)
> seconds_to_period(1:1000) %>% cumsum %>% sum
[1] 14492440
> 1:1000 %>% cumsum %>% sum
[1] 167167000
I have seen, however, that the issue lies on the fact that cumsum
only adds the seconds of the period and ignores the rest:
seconds_to_period(60) + seconds_to_period(60)
[1] "2M 0S"
but
> cumsum(c(seconds_to_period(60), seconds_to_period(60)))
[1] 0 0
Why is this behavior the default form? I think it is rather unintuitive. Additionally, what is the way to overcome this and get as a result the same as cumsum(1:1000)
using 'Period' classes of lubridate that doesn't involve doing something like:
c(seconds_to_period(60), seconds_to_period(60)) %>% as.numeric %>% cumsum
Being cumsum
a primitive, you can see here https://github.com/Microsoft/microsoft-r-open/blob/master/source/src/main/cum.c what R
it is doing under the hood. Moreover, if you read from line 215:
PROTECT(t = coerceVector(CAR(args), REALSXP));
n = XLENGTH(t);
PROTECT(s = allocVector(REALSXP, n));
setAttrib(s, R_NamesSymbol, getAttrib(t, R_NamesSymbol));
UNPROTECT(2);
This it is doing the coercion from period
to numeric
and because the structure of period, it is only keeping .Data
Compare
seconds_to_period(60)@.Data
seconds_to_period(59)@.Data
Therefore, at C level, R is not doing as.numeric
but a faster, more efficient (but you may say less subtle because it is not realizing others elements from .Data
as as.numeric
does) coercion of data.
Look as this:
setClass("Foo", representation(.Data="numeric", number1 = "numeric", number2 = "numeric"))
bar <- new("Foo",.Data=5, number1 = 12, number2 = 31)
cumsum(bar)
The result is 5, because it is only coercing to numeric
Data
Moreover:
setClass("Foo2", representation(.Data="numeric", number1 = "numeric", number2 = "numeric"))
bar2 <- new("Foo2", number1 = 12, number2 = 31)
cumsum(bar2)
Give you back numeric(0)
because there is no .Data
And
setClass("Foo3", representation( number1 = "numeric", number2 = "numeric"))
bar3 <- new("Foo3", number1 = 12, number2 = 31)
cumsum(bar3)
This is not working at all: without .Data
, internally, R does not know how to coerce it to numeric
when doing cumsum
So: it is because of how R
internally works with complex S4 objects.
You can always tell the lubridate
people to create a new parameter seconds
and store in .Data
the cumulative seconds of the whole S4 object. I guess this way cumsum
will work. But right now, the are using .Data
to store the second argument. See edit(seconds_to_period)
:
function (x)
{
span <- as.double(x)
remainder <- abs(span)
newper <- period(second = rep(0, length(x)))
slot(newper, "day") <- remainder%/%(3600 * 24)
remainder <- remainder%%(3600 * 24)
slot(newper, "hour") <- remainder%/%(3600)
remainder <- remainder%%(3600)
slot(newper, "minute") <- remainder%/%(60)
slot(newper, ".Data") <- remainder%%(60)
newper * sign(span)
}
Finally, just for fun. This is my mock version of how to make cumsum
work here:
setClass("Period2",representation(.Data="numeric", period="Period"))
seconds_to_period_2 <- function(x){
(lapply(x, function(y) new("Period2", .Data=y, period=seconds_to_period(y))))
}
a<-seconds_to_period_2(1:60)
cumsum(a)
Best!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With