Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a specific way to handle timestamp columns in R when pulling data using RPostgreSQL?

I'm trying to pull data from a PostgreSQL database and the results for a timestamp field are inconsistent. I'm not sure if I'm handling POSIXct results properly. Otherwise, I think I found a bug in the RPostgreSQL package. Here is the way to replicate the issue:

Suppose there is a table in a postgres database with one field (run this in PostgreSQL):

CREATE DATABASE mydb;
CREATE TABLE test_table
(   
  "DateTime" timestamp without time zone NOT NULL,
  CONSTRAINT "pk_test_table" PRIMARY KEY ("DateTime")
)
WITH (
  OIDS=FALSE
);
ALTER TABLE test_table
  OWNER TO postgres;

And let’s say there are a few hundred records. I will populate them in R. Here is the code:

library(RPostgreSQL)

# Let's feed the table with some sequence of date/time values
date_values <-  as.chron(seq(10000, 10500, 1/24))

format.chron <- function(z)  {
  sprintf("%04.0f-%02.0f-%02.0f %02.0f:%02.0f:00", 
            as.numeric(as.character(years(z))), 
            months(z), 
            as.numeric(as.character(days(z))), 
            as.numeric(as.character(hours(z))), 
            as.numeric(as.character(minutes(z))))
}

.generateInsertQuery <- function(date_values, field_name, table_name) {
  insert_val  <- paste(paste0("(", sQuote(format(date_values)), ")"), collapse=',')
  qry         <- paste("INSERT INTO", dQuote(table_name), paste0("(", dQuote(field_name), ")"), "VALUES", insert_val)
  qry
}

drv <- dbDriver('PostgreSQL')
con <- dbConnect(drv, user='postgres', dbname='mydb')
qry <- .generateInsertQuery(date_values, "DateTime", "test_table")
dbSendQuery(con, qry)

If I try to get the values, the time component gets stripped out of the resulting data

res <- dbGetQuery(con, "SELECT * FROM test_table")
res[1:20,1]

The class of the result, however, is POSIXct

class(res[,1])

If the result is fetched one record at a time, the values with hour:min equal to 00:00 loose the time component:

rs <- dbSendQuery(con, "SELECT \"DateTime\" FROM test_table")
res_list <- list()
for(i in 1:100) res_list[i]  <- fetch(rs,1)
res_list

As a workaround, I'm fetching the result 1 record at a time, fixing, and aggregating them into a data.frame. But this is very time-consuming, especially for large data sets. Any ideas of why this is happening and how to deal with this issue?

Thanks in advance!

like image 446
JAponte Avatar asked Jul 19 '13 20:07

JAponte


People also ask

How are timestamps stored in Postgres?

The TIMESTAMP (also known as TIMESTAMP WITHOUT TIME ZONE ) and TIMESTAMPTZ (also known as TIMESTAMP WITH TIME ZONE ) types stored as a 64-bit integer as a microsecond offset since 1970-01-01 in CRDB and as a 64-bit integer microsecond offset since 2000-01-01 in PostgreSQL (by default).

Does Postgres support timestamp?

PostgreSQL supports the full set of SQL date and time types, shown in Table 8.9.

How do I cast a timestamp in PostgreSQL?

The TO_DATE function in PostgreSQL is used to converting strings into dates. Its syntax is TO_DATE(text, text) and the return type is date. The TO_TIMESTAMP function converts string data into timestamps with timezone. Its syntax is to_timestamp(text, text) .

How do I change the format of a timestamp in PostgreSQL?

You can specify double colons (::) to cast a DATETIME value to a DATE value. You can combine TO_CHAR() and the Now() function to convert the current timestamp into the specified format.


2 Answers

RPostgreSQL's dbWriteTable with any posixct field will create database field of type timestamp with timezone always with tz +00 no matter what posixct timezone it will be. I believe more precise would be to create timestamp without timezone instead.

Best solution for both dbReadTable and dbWriteTable is to use Sys.setenv(TZ = "UTC"). In my opinion it is too deep dependency because many other processes in R session may require proper timezone setting.

Much more specific and not to deep dependent is to define own dbReadTable and dbWriteTable which wraps the DBI version with appropriate preprocess/postprocess of posixct types. But still it is not an option if you are developing DBI-compliant code/package (not only postgres related).

It would be great to have RPostgreSQL migrated to github for easier contribution.

like image 168
jangorecki Avatar answered Nov 15 '22 10:11

jangorecki


First off, the RPostgreSQL project has a mailing list; I suggest you post there.

PostgreSQL has two datetime types: with and without timezone. As I recall, R only maps the latter. I did write some early regression tests for this (see the package source) but have not been that involved with the project of late. But I do recall that POSIXct maps back and forth to the PostgreSQL datetime type just fine.

like image 24
Dirk Eddelbuettel Avatar answered Nov 15 '22 10:11

Dirk Eddelbuettel