Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does dplyr's mutate() change the time format?

Tags:

r

dplyr

readr

I use readr to read in data which consists a date column in time format. I can read it in correctly using the col_types option of readr.

library(dplyr)
library(readr)

sample <- "time,id
2015-03-05 02:28:11,1674
2015-03-03 13:10:59,36749
2015-03-05 07:55:48,NA
2015-03-05 06:13:19,NA
"

mydf <- read_csv(sample, col_types="Ti")
mydf
                 time    id
1 2015-03-05 02:28:11  1674
2 2015-03-03 13:10:59 36749
3 2015-03-05 07:55:48    NA
4 2015-03-05 06:13:19    NA

This is nice. However, if I want to manipulate this column with dplyr, the time column loses its format.

mydf %>% mutate(time = ifelse(is.na(id), NA, time))
        time    id
1 1425522491  1674
2 1425388259 36749
3         NA    NA
4         NA    NA

Why is this happening?

I know I can work around this problem by transforming it to character before, but it would be more convenient without transforming back and forth.

mydf %>% mutate(time = as.character(time)) %>% 
    mutate(time = ifelse(is.na(id), NA, time))
like image 853
janosdivenyi Avatar asked Sep 01 '15 16:09

janosdivenyi


People also ask

What does mutate in dplyr do?

mutate() adds new variables and preserves existing ones; transmute() adds new variables and drops existing ones. New variables overwrite existing variables of the same name.

What does mutate all do in R?

mutate() – adds new variables while retaining old variables to a data frame. transmute() – adds new variables and removes old ones from a data frame. mutate_all() – changes every variable in a data frame simultaneously. mutate_at() – changes certain variables by name.

How do I change the date format in R?

To format = , provide a character string (in quotes) that represents the current date format using the special “strptime” abbreviations below. For example, if your character dates are currently in the format “DD/MM/YYYY”, like “24/04/1968”, then you would use format = "%d/%m/%Y" to convert the values into dates.

Does mutate create a new column in R?

You can use the mutate() function from the dplyr package to add one or more columns to a data frame in R.


2 Answers

There is another version of if_else by @hadley in dplyr. It correctly manage time variables. Look at this github issue as well.

like image 165
Alexander Avatar answered Oct 20 '22 06:10

Alexander


It's actually ifelse() that is causing this issue, not dplyr::mutate(). An example of the problem of attribute stripping is shown in help(ifelse) -

## ifelse() strips attributes
## This is important when working with Dates and factors
x <- seq(as.Date("2000-02-29"), as.Date("2004-10-04"), by = "1 month")
## has many "yyyy-mm-29", but a few "yyyy-03-01" in the non-leap years
y <- ifelse(as.POSIXlt(x)$mday == 29, x, NA)
head(y) # not what you expected ... ==> need restore the class attribute:
class(y) <- class(x)

So there you have it. It's a bit of extra work if you want to use ifelse(). Here are two possible methods that will get you to your desired result without ifelse(). The first is really simple and uses is.na<-.

## mark 'time' as NA if 'id' is NA
is.na(mydf$time) <- is.na(mydf$id)

## resulting in
mydf
#                  time    id
# 1 2015-03-05 02:28:11  1674
# 2 2015-03-03 13:10:59 36749
# 3                <NA>    NA
# 4                <NA>    NA

If you don't wish to choose that route, and want to continue with the dplyr method, you can use replace() instead of ifelse().

mydf %>% mutate(time = replace(time, is.na(id), NA))
#                  time    id
# 1 2015-03-05 02:28:11  1674
# 2 2015-03-03 13:10:59 36749
# 3                <NA>    NA
# 4                <NA>    NA

Data:

mydf <- structure(list(time = structure(c(1425551291, 1425417059, 1425570948, 
1425564799), class = c("POSIXct", "POSIXt"), tzone = ""), id = c(1674L, 
36749L, NA, NA)), .Names = c("time", "id"), class = "data.frame", row.names = c(NA, 
-4L))
like image 23
Rich Scriven Avatar answered Oct 20 '22 08:10

Rich Scriven