I use readr
to read in data which consists a date column in time format. I can read it in correctly using the col_types
option of readr
.
library(dplyr)
library(readr)
sample <- "time,id
2015-03-05 02:28:11,1674
2015-03-03 13:10:59,36749
2015-03-05 07:55:48,NA
2015-03-05 06:13:19,NA
"
mydf <- read_csv(sample, col_types="Ti")
mydf
time id
1 2015-03-05 02:28:11 1674
2 2015-03-03 13:10:59 36749
3 2015-03-05 07:55:48 NA
4 2015-03-05 06:13:19 NA
This is nice. However, if I want to manipulate this column with dplyr
, the time column loses its format.
mydf %>% mutate(time = ifelse(is.na(id), NA, time))
time id
1 1425522491 1674
2 1425388259 36749
3 NA NA
4 NA NA
Why is this happening?
I know I can work around this problem by transforming it to character before, but it would be more convenient without transforming back and forth.
mydf %>% mutate(time = as.character(time)) %>%
mutate(time = ifelse(is.na(id), NA, time))
mutate() adds new variables and preserves existing ones; transmute() adds new variables and drops existing ones. New variables overwrite existing variables of the same name.
mutate() – adds new variables while retaining old variables to a data frame. transmute() – adds new variables and removes old ones from a data frame. mutate_all() – changes every variable in a data frame simultaneously. mutate_at() – changes certain variables by name.
To format = , provide a character string (in quotes) that represents the current date format using the special “strptime” abbreviations below. For example, if your character dates are currently in the format “DD/MM/YYYY”, like “24/04/1968”, then you would use format = "%d/%m/%Y" to convert the values into dates.
You can use the mutate() function from the dplyr package to add one or more columns to a data frame in R.
There is another version of if_else
by @hadley in dplyr
. It correctly manage time variables. Look at this github issue as well.
It's actually ifelse()
that is causing this issue, not dplyr::mutate()
. An example of the problem of attribute stripping is shown in help(ifelse)
-
## ifelse() strips attributes ## This is important when working with Dates and factors x <- seq(as.Date("2000-02-29"), as.Date("2004-10-04"), by = "1 month") ## has many "yyyy-mm-29", but a few "yyyy-03-01" in the non-leap years y <- ifelse(as.POSIXlt(x)$mday == 29, x, NA) head(y) # not what you expected ... ==> need restore the class attribute: class(y) <- class(x)
So there you have it. It's a bit of extra work if you want to use ifelse()
. Here are two possible methods that will get you to your desired result without ifelse()
. The first is really simple and uses is.na<-
.
## mark 'time' as NA if 'id' is NA
is.na(mydf$time) <- is.na(mydf$id)
## resulting in
mydf
# time id
# 1 2015-03-05 02:28:11 1674
# 2 2015-03-03 13:10:59 36749
# 3 <NA> NA
# 4 <NA> NA
If you don't wish to choose that route, and want to continue with the dplyr
method, you can use replace()
instead of ifelse()
.
mydf %>% mutate(time = replace(time, is.na(id), NA))
# time id
# 1 2015-03-05 02:28:11 1674
# 2 2015-03-03 13:10:59 36749
# 3 <NA> NA
# 4 <NA> NA
Data:
mydf <- structure(list(time = structure(c(1425551291, 1425417059, 1425570948,
1425564799), class = c("POSIXct", "POSIXt"), tzone = ""), id = c(1674L,
36749L, NA, NA)), .Names = c("time", "id"), class = "data.frame", row.names = c(NA,
-4L))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With