I stumbled across a peculiar behavior in the lubridate
package: dmy(NA)
trows an error instead of just returning an NA. This causes me problems when I want to convert a column with some elements being NAs and some date-strings that are normally converted without problems.
Here is the minimal example:
library(lubridate)
df <- data.frame(ID=letters[1:5],
Datum=c("01.01.1990", NA, "11.01.1990", NA, "01.02.1990"))
df_copy <- df
#Question 1: Why does dmy(NA) not return NA, but throws an error?
df$Datum <- dmy(df$Datum)
Error in function (..., sep = " ", collapse = NULL) : invalid separator
df <- df_copy
#Question 2: What's a work around?
#1. Idea: Only convert those elements that are not NAs
#RHS works, but assigning that to the LHS doesn't work (Most likely problem::
#column "Datum" is still of class factor, while the RHS is of class POSIXct)
df[!is.na(df$Datum), "Datum"] <- dmy(df[!is.na(df$Datum), "Datum"])
Using date format %d.%m.%Y.
Warning message:
In `[<-.factor`(`*tmp*`, iseq, value = c(NA_integer_, NA_integer_, :
invalid factor level, NAs generated
df #Only NAs, apparently problem with class of column "Datum"
ID Datum
1 a <NA>
2 b <NA>
3 c <NA>
4 d <NA>
5 e <NA>
df <- df_copy
#2. Idea: Use mapply and apply dmy only to those elements that are not NA
df[, "Datum"] <- mapply(function(x) {if (is.na(x)) {
return(NA)
} else {
return(dmy(x))
}}, df$Datum)
df #Meaningless numbers returned instead of date-objects
ID Datum
1 a 631152000
2 b NA
3 c 632016000
4 d NA
5 e 633830400
To summarize, I have two questions: 1) Why does dmy(NA) not work? Based on most other functions I would assume it is good programming practice that every transformation (such as dmy()) of NA
returns NA
again (just as 2 + NA
does)? If this behavior is intended, how do I convert a data.frame
column that includes NA
s via the dmy()
function?
lubridate: Make Dealing with Dates a Little Easier Functions to work with date-times and time-spans: fast and user friendly parsing of date-time data, extraction and updating of components of a date-time (years, months, days, hours, minutes, and seconds), algebraic manipulation on date-time and time-span objects.
Lubridate is an R package that makes it easier to work with dates and times. Below is a concise tour of some of the things lubridate can do for you. Lubridate was created by Garrett Grolemund and Hadley Wickham, and is now maintained by Vitalie Spinu.
The Error in function (..., sep = " ", collapse = NULL) : invalid separator
is being caused by the lubridate:::guess_format()
function. The NA
is being passed as sep
in a call to paste()
, specifically at fmts <- unlist(mlply(with_seps, paste))
. You can have a go at improving the lubridate:::guess_format()
to fix this.
Otherwise, could you just change the NA
to characters ("NA"
)?
require(lubridate)
df <- data.frame(ID=letters[1:5],
Datum=c("01.01.1990", "NA", "11.01.1990", "NA", "01.02.1990")) #NAs are quoted
df_copy <- df
df$Datum <- dmy(df$Datum)
Since your dates are in a reasonably straight-forward format, it might be much simpler to just use as.Date
and specify the appropriate format
argument:
df$Date <- as.Date(df$Datum, format="%d.%m.%Y")
df
ID Datum Date
1 a 01.01.1990 1990-01-01
2 b <NA> <NA>
3 c 11.01.1990 1990-01-11
4 d <NA> <NA>
5 e 01.02.1990 1990-02-01
To see a list of the formatting codes used by as.Date
, see ?strptime
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With