Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

POSIXct object is NA, but is.na() returns FALSE

I have encountered some very peculiar behaviour in R. I think it might even be a bug, but I'm asking here to check if someone is familiar with it or knows a solution.

What I'm trying to do is the following: I have a data frame with dates assigned to groups. I'm performing a for-loop over these groups, in which I calculate the maximum of the dates in this group. I want to skip the rest of the loop (next) if this maximum date is NA. However, this doesn't happen correctly.

Consider the following code:

library(dplyr)
library(lubridate)
a <- data.frame(group = c(1,1,1,1,1, 2,2,2,2, 3),
            ds = as_datetime(dmy('01-01-2018', NA, '03-01-2018', NA, '05-01-2018',
                                 '02-01-2018', '04-01-2018', '06-01-2018', '08-01-2018',
                                 NA)))

for (i in 1:3) {
  max_ds <- a %>% filter(group == i) %>% .$ds %>% max(na.rm = T)
  if (is.na(max_ds)) { next }
  print(max_ds)
}

The expected output is:

# [1] "2018-01-05 UTC"
# [1] "2018-01-08 UTC"

However, the obtained output is:

# [1] "2018-01-05 UTC"
# [1] "2018-01-08 UTC"
# [1] NA

The crux to this mystery seems to lie in the na.rm clause. If it is removed, the following happens:

for (i in 1:nr_groups) {
  max_ds <- a %>% filter(group == i) %>% .$ds %>% max()
  if (is.na(max_ds)) { next }
  print(max_ds)
}

# [1] "2018-01-08 UTC"

Which is exactly the expected result.

Any ideas?

like image 317
A. Stam Avatar asked Apr 18 '18 13:04

A. Stam


People also ask

What is posixct?

logical indicating to return NA (instead of signalling an error) if the format guessing does not succeed. a date-time object, or something which can be coerced by as.POSIXct (tz = "GMT") to such an object. as.POSIXct and as.POSIXlt return an object of the appropriate class.

What is posixct in R?

POSIXct () mirrors primitive contructors in base R ( double () , character () etc.) A non-negative number specifying the desired length. Supplying an argument of length other than one is an error. TRUE if x is a POSIXct or POSIXlt object, FALSE otherwise. An object of class POSIXct (inherits from POSIXt) of length 1.

What does true mean in R with Na elements?

Compare the output with the data table above — The TRUE values are at the same position as before the NA elements. An important feature of is.na is that the function can be reversed by simply putting a ! (exclamation mark) in front. In this case, TRUE indicates a value that is not NA in R:

Is posixct a time only class?

Posixct is not a time only class, POSIXct represents the (signed) number of seconds since the beginning of 1970 (in the UTC time zone) as a numeric vector. It's a datetime class. I tried using this instead, adding this at the end of my code it didn't work. Any idea where i am going wrong please?


Video Answer


1 Answers

The issue is that you pass NA together with na.rm = TRUE. Then this happens:

max(NA, na.rm = TRUE)
#[1] -Inf
#Warning message:
#In max(NA, na.rm = TRUE) : no non-missing arguments to max; returning -Inf

The result is obviously not NA. If you pass a datetime variable, the result is still not NA, but printed as NA:

max(as.POSIXct(NA), na.rm = TRUE)
#[1] NA
#Warning message:
#In max.default(NA_real_, na.rm = TRUE) :
#  no non-missing arguments to max; returning -Inf
as.POSIXct(-Inf, origin = "1900-01-01")
#[1] NA
unclass(as.POSIXct(-Inf, origin = "1900-01-01"))
#[1] -Inf
#attr(,"tzone")
#[1] ""

You might want to test with is.finite:

!is.finite(max(as.POSIXct(NA), na.rm = TRUE))
#[1] TRUE
#Warning message:
#In max.default(NA_real_, na.rm = TRUE) :
#  no non-missing arguments to max; returning -Inf
like image 79
Roland Avatar answered Sep 23 '22 15:09

Roland