Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

r, write_csv is changing all times/dates to UTC

Tags:

r

posixct

readr

I have found a very annoying problem that I want to share with the community. This is a question that I have found an acceptable solution for (detailed below), but I now have several follow-up questions. My knowledge of time stamps and POSIX variables is limited, particularity how plyr, dplyr, and readr handle these.

When working with POSIX variables (aka, date and time stamps), I found that write_csv from readr changed these variables into UTC time.

I am downloading data from an API and preserving the time stamp. Each time I grab data, I bind it to an existing file and save the file. My timezone is MDT, and I am requesting data using MDT time, which I am then trying to bind to a file in UTC time, and the times don't match...it gets messy and frustrating. In essence the beautiful time stamp database I am trying to create is turning into a pile of garbage.

To remedy this problem, I converted the POSIX time column to character column using:

df.time <- as.character(df.time)

This allowed me to save the files in a time zone consistent with the time stamps being returned to me by the API.

This leads me to the following series of questions:

  1. Is there a program that can join POSIX variables across time zones? For instance, if its noon MDT, its 6pm UTC. Could I join two dataframes based on these time stamps without having to convert them to the same time zone first?
  2. Is it possible to prevent write_csv from changing POSIX variables to UTC?
  3. Is there a csv write function that doesn't change POSIX variables?

EDIT: I have included some example data of what I am talking about:

> df1 <- as.data.frame(fromJSON("https://api.pro.coinbase.com/products/BTC-USD/candles?start=2018-07-23&12:57:00?stop=2018-07-23&19:34:58granularity=300"))
> colnames(df1) <- c("time", "low", "high", "open", "close", "volume")
> df1$time <- anytime(df1$time)
> df1Sort <- df1[order(df1$time),]
> head(df1Sort, 5)
                   time     low    high    open   close    volume
299 2018-07-23 16:13:00 7747.00 7747.01 7747.01 7747.01 9.2029168
298 2018-07-23 16:14:00 7743.17 7747.01 7747.00 7747.01 7.0205668
297 2018-07-23 16:15:00 7745.47 7745.73 7745.67 7745.73 0.9075707
296 2018-07-23 16:16:00 7745.72 7745.73 7745.72 7745.73 4.6715157
295 2018-07-23 16:17:00 7745.72 7745.73 7745.72 7745.72 2.4921921
> write_csv(df1Sort, "df1Sort.csv", col_names = TRUE)
> df2 <- read_csv("df1Sort.csv", col_names = TRUE)
Parsed with column specification:
cols(
  time = col_datetime(format = ""),
  low = col_double(),
  high = col_double(),
  open = col_double(),
  close = col_double(),
  volume = col_double()
)
> head(df2, 5)
# A tibble: 5 x 6
  time                  low  high  open close volume
  <dttm>              <dbl> <dbl> <dbl> <dbl>  <dbl>
1 2018-07-23 22:13:00  7747  7747  7747  7747  9.20 
2 2018-07-23 22:14:00  7743  7747  7747  7747  7.02 
3 2018-07-23 22:15:00  7745  7746  7746  7746  0.908
4 2018-07-23 22:16:00  7746  7746  7746  7746  4.67 
5 2018-07-23 22:17:00  7746  7746  7746  7746  2.49 
like image 880
Phil_T Avatar asked Jul 24 '18 01:07

Phil_T


People also ask

How to convert UTC to local time in R?

It is written as write.csv does by using the as.character method which heeds digits.secs and converts from R's internal UTC representation back to local time (or the "tzone" attribute) as of that historical date. Accordingly this can be slow.

How to write data to CSV file using R programming language?

So, In this article we are going to learn that how to write data to CSV File using R Programming Language. To write to csv file write.csv () function is used. Let us first create a data frame. Now let us write this data to a csv file and save it to a required location.

Why is my CSV file so slow in R?

"write.csv" - this currently affects POSIXct only. It is written as write.csv does by using the as.character method which heeds digits.secs and converts from R's internal UTC representation back to local time (or the "tzone" attribute) as of that historical date. Accordingly this can be slow.

What are the options passed to write a CSV file?

Other options passed to write.csv2 or write.csv. AlphaPart, object returned from AlphaPart function or summaryAlphaPart, object returned from summary.AlphaPart function. Character, file name with or without .csv extension, e.g., both "file" and "file.csv" are valid.


1 Answers

  1. "Is there a program that can join POSIX variables across time zones... without having to convert them to the same time zone first?"

    Maybe? But if so, they're almost certainly just converting to UTC under the hood and just hiding it from you. I'm unaware of any thing like this in R. (data.table being the only package I'm aware of that can join on anything other than exact equality, and it doesn't have this feature.) If I were you, I'd just convert everything to one timezone - probably UTC.

    For more reading for best practices this SQL-focused answer seems very good.

  2. "Is it possible to prevent write_csv from changing POSIX variables to UTC?"

    Not built-in. The ?write_csv documentation is pretty clear: It doesn't list any options for this and does say "POSIXct's are formatted as ISO8601."

  3. "Is there a csv write function that doesn't change POSIX variables?"

    Sure, the built-in write.csv doesn't change to UTC (I think it uses system settings), and data.table::fwrite offers quite a few options. If you want to control how your dates are saved, I think your best bet is to convert them to character in whatever format you want, and then any of the writing functions should handle them just fine. You should check out the ?data.table::fwrite documentation, it's got good info. They warn that the "write.csv" option can be quite slow.


You should include reproducible examples with your questions. Here's one for this:

t = as.POSIXct("2018-01-01 01:30:00", tz = "Africa/Addis_Ababa")
t
# [1] "2018-01-01 01:30:00 EAT"

d = data.frame(t)

library(readr)
write_csv(d, "tz_test.csv")
system("head tz_test.csv")
# 2017-12-31T22:30:00Z

library(data.table)
fwrite(d, "tz_test_dt.csv", dateTimeAs = "write.csv")
system("head tz_test_dt.csv")
# t
# 2018-01-01 01:30:00

write.csv(d, "tz_test_base.csv")
system("head tz_test_base.csv")
# "","t"
# "1",2018-01-01 01:30:00
like image 104
Gregor Thomas Avatar answered Sep 29 '22 16:09

Gregor Thomas