I have found a very annoying problem that I want to share with the community. This is a question that I have found an acceptable solution for (detailed below), but I now have several follow-up questions. My knowledge of time stamps and POSIX variables is limited, particularity how plyr, dplyr, and readr handle these. When working with POSIX variables (aka, date and time stamps), I found that write_csv from readr changed these variables into UTC time. I am downloading data from an API and preserving the time stamp. Each time I grab data, I bind it to an existing file and save the file. My timezone is MDT, and I am requesting data using MDT time, which I am then trying to bind to a file in UTC time, and the times don't match...it gets messy and frustrating. In essence the beautiful time stamp database I am trying to create is turning into a pile of garbage. To remedy this problem, I converted the POSIX time column to character column using: <pre class="prettyprint"><code>df.time <- as.character(df.time) </code></pre> This allowed me to save the files in a time zone consistent with the time stamps being returned to me by the API. This leads me to the following series of questions: <ol> <li>Is there a program that can join POSIX variables across time zones? For instance, if its noon MDT, its 6pm UTC. Could I join two dataframes based on these time stamps without having to convert them to the same time zone first? </li> <li>Is it possible to prevent write_csv from changing POSIX variables to UTC?</li> <li>Is there a csv write function that doesn't change POSIX variables?</li> </ol> EDIT: I have included some example data of what I am talking about: <pre class="prettyprint"><code>> df1 <- as.data.frame(fromJSON("https://api.pro.coinbase.com/products/BTC-USD/candles?start=2018-07-23&12:57:00?stop=2018-07-23&19:34:58granularity=300")) > colnames(df1) <- c("time", "low", "high", "open", "close", "volume") > df1$time <- anytime(df1$time) > df1Sort <- df1[order(df1$time),] > head(df1Sort, 5) time low high open close volume 299 2018-07-23 16:13:00 7747.00 7747.01 7747.01 7747.01 9.2029168 298 2018-07-23 16:14:00 7743.17 7747.01 7747.00 7747.01 7.0205668 297 2018-07-23 16:15:00 7745.47 7745.73 7745.67 7745.73 0.9075707 296 2018-07-23 16:16:00 7745.72 7745.73 7745.72 7745.73 4.6715157 295 2018-07-23 16:17:00 7745.72 7745.73 7745.72 7745.72 2.4921921 > write_csv(df1Sort, "df1Sort.csv", col_names = TRUE) > df2 <- read_csv("df1Sort.csv", col_names = TRUE) Parsed with column specification: cols( time = col_datetime(format = ""), low = col_double(), high = col_double(), open = col_double(), close = col_double(), volume = col_double() ) > head(df2, 5) # A tibble: 5 x 6 time low high open close volume <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> 1 2018-07-23 22:13:00 7747 7747 7747 7747 9.20 2 2018-07-23 22:14:00 7743 7747 7747 7747 7.02 3 2018-07-23 22:15:00 7745 7746 7746 7746 0.908 4 2018-07-23 22:16:00 7746 7746 7746 7746 4.67 5 2018-07-23 22:17:00 7746 7746 7746 7746 2.49 </code></pre>

<ol> <li> "Is there a program that can join POSIX variables across time zones... without having to convert them to the same time zone first?" Maybe? But if so, they're almost certainly just converting to UTC under the hood and just hiding it from you. I'm unaware of any thing like this in R. (<code>data.table</code> being the only package I'm aware of that can join on anything other than exact equality, and it doesn't have this feature.) If I were you, I'd just convert everything to one timezone - probably UTC. For more reading for best practices this SQL-focused answer seems very good. </li> <li> "Is it possible to prevent write_csv from changing POSIX variables to UTC?" Not built-in. The <code>?write_csv</code> documentation is pretty clear: It doesn't list any options for this and does say "POSIXct's are formatted as ISO8601." </li> <li> "Is there a csv write function that doesn't change POSIX variables?" Sure, the built-in <code>write.csv</code> doesn't change to UTC (I think it uses system settings), and <code>data.table::fwrite</code> offers quite a few options. If you want to control how your dates are saved, I think your best bet is to convert them to <code>character</code> in whatever format you want, and then any of the writing functions should handle them just fine. You should check out the <code>?data.table::fwrite</code> documentation, it's got good info. They warn that the <code>"write.csv"</code> option can be quite slow. </li> </ol> <hr> You should include reproducible examples with your questions. Here's one for this: <pre class="prettyprint"><code>t = as.POSIXct("2018-01-01 01:30:00", tz = "Africa/Addis_Ababa") t # [1] "2018-01-01 01:30:00 EAT" d = data.frame(t) library(readr) write_csv(d, "tz_test.csv") system("head tz_test.csv") # 2017-12-31T22:30:00Z library(data.table) fwrite(d, "tz_test_dt.csv", dateTimeAs = "write.csv") system("head tz_test_dt.csv") # t # 2018-01-01 01:30:00 write.csv(d, "tz_test_base.csv") system("head tz_test_base.csv") # "","t" # "1",2018-01-01 01:30:00 </code></pre>

r, write_csv is changing all times/dates to UTC

Tags:

r

posixct

readr

I have found a very annoying problem that I want to share with the community. This is a question that I have found an acceptable solution for (detailed below), but I now have several follow-up questions. My knowledge of time stamps and POSIX variables is limited, particularity how plyr, dplyr, and readr handle these.

When working with POSIX variables (aka, date and time stamps), I found that write_csv from readr changed these variables into UTC time.

I am downloading data from an API and preserving the time stamp. Each time I grab data, I bind it to an existing file and save the file. My timezone is MDT, and I am requesting data using MDT time, which I am then trying to bind to a file in UTC time, and the times don't match...it gets messy and frustrating. In essence the beautiful time stamp database I am trying to create is turning into a pile of garbage.

To remedy this problem, I converted the POSIX time column to character column using:

df.time <- as.character(df.time)

This allowed me to save the files in a time zone consistent with the time stamps being returned to me by the API.

This leads me to the following series of questions:

Is there a program that can join POSIX variables across time zones? For instance, if its noon MDT, its 6pm UTC. Could I join two dataframes based on these time stamps without having to convert them to the same time zone first?
Is it possible to prevent write_csv from changing POSIX variables to UTC?
Is there a csv write function that doesn't change POSIX variables?

EDIT: I have included some example data of what I am talking about:

> df1 <- as.data.frame(fromJSON("https://api.pro.coinbase.com/products/BTC-USD/candles?start=2018-07-23&12:57:00?stop=2018-07-23&19:34:58granularity=300"))
> colnames(df1) <- c("time", "low", "high", "open", "close", "volume")
> df1$time <- anytime(df1$time)
> df1Sort <- df1[order(df1$time),]
> head(df1Sort, 5)
                   time     low    high    open   close    volume
299 2018-07-23 16:13:00 7747.00 7747.01 7747.01 7747.01 9.2029168
298 2018-07-23 16:14:00 7743.17 7747.01 7747.00 7747.01 7.0205668
297 2018-07-23 16:15:00 7745.47 7745.73 7745.67 7745.73 0.9075707
296 2018-07-23 16:16:00 7745.72 7745.73 7745.72 7745.73 4.6715157
295 2018-07-23 16:17:00 7745.72 7745.73 7745.72 7745.72 2.4921921
> write_csv(df1Sort, "df1Sort.csv", col_names = TRUE)
> df2 <- read_csv("df1Sort.csv", col_names = TRUE)
Parsed with column specification:
cols(
  time = col_datetime(format = ""),
  low = col_double(),
  high = col_double(),
  open = col_double(),
  close = col_double(),
  volume = col_double()
)
> head(df2, 5)
# A tibble: 5 x 6
  time                  low  high  open close volume
  <dttm>              <dbl> <dbl> <dbl> <dbl>  <dbl>
1 2018-07-23 22:13:00  7747  7747  7747  7747  9.20 
2 2018-07-23 22:14:00  7743  7747  7747  7747  7.02 
3 2018-07-23 22:15:00  7745  7746  7746  7746  0.908
4 2018-07-23 22:16:00  7746  7746  7746  7746  4.67 
5 2018-07-23 22:17:00  7746  7746  7746  7746  2.49

880

asked Jul 24 '18 01:07

Phil_T

1 Answers

"Is there a program that can join POSIX variables across time zones... without having to convert them to the same time zone first?"

Maybe? But if so, they're almost certainly just converting to UTC under the hood and just hiding it from you. I'm unaware of any thing like this in R. (data.table being the only package I'm aware of that can join on anything other than exact equality, and it doesn't have this feature.) If I were you, I'd just convert everything to one timezone - probably UTC.

For more reading for best practices this SQL-focused answer seems very good.
"Is it possible to prevent write_csv from changing POSIX variables to UTC?"

Not built-in. The ?write_csv documentation is pretty clear: It doesn't list any options for this and does say "POSIXct's are formatted as ISO8601."
"Is there a csv write function that doesn't change POSIX variables?"

Sure, the built-in write.csv doesn't change to UTC (I think it uses system settings), and data.table::fwrite offers quite a few options. If you want to control how your dates are saved, I think your best bet is to convert them to character in whatever format you want, and then any of the writing functions should handle them just fine. You should check out the ?data.table::fwrite documentation, it's got good info. They warn that the "write.csv" option can be quite slow.

You should include reproducible examples with your questions. Here's one for this:

t = as.POSIXct("2018-01-01 01:30:00", tz = "Africa/Addis_Ababa")
t
# [1] "2018-01-01 01:30:00 EAT"

d = data.frame(t)

library(readr)
write_csv(d, "tz_test.csv")
system("head tz_test.csv")
# 2017-12-31T22:30:00Z

library(data.table)
fwrite(d, "tz_test_dt.csv", dateTimeAs = "write.csv")
system("head tz_test_dt.csv")
# t
# 2018-01-01 01:30:00

write.csv(d, "tz_test_base.csv")
system("head tz_test_base.csv")
# "","t"
# "1",2018-01-01 01:30:00

104

answered Sep 29 '22 16:09

Gregor Thomas

Related questions
                            
                                pie chart highcharter R
                            
                                Dplyr difference between select and group_by with respect to quoted variables?
                            
                                Automatically curving an arc when it is overlapping with another one
                            
                                Rselenium can't connect to running shiny app
                            
                                RMarkdown PDF "LaTeX3 error: Erroneous variable"
                            
                                Include blockquote in rmarkdown ioslides
                            
                                Use a color palette for matrix points in UpSetR
                            
                                Overlay image on R leaflet map
                            
                                same area for all violins independent of facets in ggplot2
                            
                                Executing a stored oracle procedure in R using ROracle
                            
                                Importing xlsx data to R when numbers have a comma as decimal separator
                            
                                Mount local volume accessible to R/RStudio in docker (tidyverse)
                            
                                How to use RMixpanel library to export data in csv
                            
                                Remove single dplyr group_by group
                            
                                Comparing simple features {sf} and Spatial objects {sp}: speed and memory
                            
                                Simulate data for logistic regression with fixed r2
                            
                                Using R plumber for web API over HTTPS
                            
                                Multiple inputs to reactive value R Shiny
                            
                                Build a RStudio addin to debug pipe chains
                            
                                How to reverse secondary continuous_y_axis in ggplot2

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With