I have encountered some very peculiar behaviour in R. I think it might even be a bug, but I'm asking here to check if someone is familiar with it or knows a solution.
What I'm trying to do is the following: I have a data frame with dates assigned to groups. I'm performing a for-loop over these groups, in which I calculate the maximum of the dates in this group. I want to skip the rest of the loop (next
) if this maximum date is NA
. However, this doesn't happen correctly.
Consider the following code:
library(dplyr)
library(lubridate)
a <- data.frame(group = c(1,1,1,1,1, 2,2,2,2, 3),
ds = as_datetime(dmy('01-01-2018', NA, '03-01-2018', NA, '05-01-2018',
'02-01-2018', '04-01-2018', '06-01-2018', '08-01-2018',
NA)))
for (i in 1:3) {
max_ds <- a %>% filter(group == i) %>% .$ds %>% max(na.rm = T)
if (is.na(max_ds)) { next }
print(max_ds)
}
The expected output is:
# [1] "2018-01-05 UTC"
# [1] "2018-01-08 UTC"
However, the obtained output is:
# [1] "2018-01-05 UTC"
# [1] "2018-01-08 UTC"
# [1] NA
The crux to this mystery seems to lie in the na.rm
clause. If it is removed, the following happens:
for (i in 1:nr_groups) {
max_ds <- a %>% filter(group == i) %>% .$ds %>% max()
if (is.na(max_ds)) { next }
print(max_ds)
}
# [1] "2018-01-08 UTC"
Which is exactly the expected result.
Any ideas?
logical indicating to return NA (instead of signalling an error) if the format guessing does not succeed. a date-time object, or something which can be coerced by as.POSIXct (tz = "GMT") to such an object. as.POSIXct and as.POSIXlt return an object of the appropriate class.
POSIXct () mirrors primitive contructors in base R ( double () , character () etc.) A non-negative number specifying the desired length. Supplying an argument of length other than one is an error. TRUE if x is a POSIXct or POSIXlt object, FALSE otherwise. An object of class POSIXct (inherits from POSIXt) of length 1.
Compare the output with the data table above — The TRUE values are at the same position as before the NA elements. An important feature of is.na is that the function can be reversed by simply putting a ! (exclamation mark) in front. In this case, TRUE indicates a value that is not NA in R:
Posixct is not a time only class, POSIXct represents the (signed) number of seconds since the beginning of 1970 (in the UTC time zone) as a numeric vector. It's a datetime class. I tried using this instead, adding this at the end of my code it didn't work. Any idea where i am going wrong please?
The issue is that you pass NA
together with na.rm = TRUE
. Then this happens:
max(NA, na.rm = TRUE)
#[1] -Inf
#Warning message:
#In max(NA, na.rm = TRUE) : no non-missing arguments to max; returning -Inf
The result is obviously not NA
. If you pass a datetime variable, the result is still not NA
, but printed as NA
:
max(as.POSIXct(NA), na.rm = TRUE)
#[1] NA
#Warning message:
#In max.default(NA_real_, na.rm = TRUE) :
# no non-missing arguments to max; returning -Inf
as.POSIXct(-Inf, origin = "1900-01-01")
#[1] NA
unclass(as.POSIXct(-Inf, origin = "1900-01-01"))
#[1] -Inf
#attr(,"tzone")
#[1] ""
You might want to test with is.finite
:
!is.finite(max(as.POSIXct(NA), na.rm = TRUE))
#[1] TRUE
#Warning message:
#In max.default(NA_real_, na.rm = TRUE) :
# no non-missing arguments to max; returning -Inf
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With