Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to read a data.table from PostgreSQL?

I'm doing some analysis on a large volume of data stored in a PostgreSQL database. For speed and memory reasons I'm using the data.table package. Currently I'm doing this to read the data.

library(RPostgreSQL)
library(data.table)
...
query <- "SELECT * FROM eqtl"
data <- as.data.table(dbGetQuery(con, query))

I'm wondering if there is a better way to do this that doesn't involve reading the whole thing into a data.frame and then copying it into a data.table.

like image 363
rmccloskey Avatar asked Feb 06 '15 21:02

rmccloskey


People also ask

How do I read tables in PostgreSQL?

Use the \dt or \dt+ command in psql to show tables in a specific database. Use the SELECT statement to query table information from the pg_catalog.

How do I query data in PostgreSQL?

If you want to select data from all the columns of the table, you can use an asterisk ( * ) shorthand instead of specifying all the column names. The select list may also contain expressions or literal values. Second, specify the name of the table from which you want to query data after the FROM keyword.

Is PostgreSQL good for data analysis?

Introduction to PostgreSQL Its versatility makes it possible to be used as a transactional database as well as a data warehouse for analytics. With its origins dating back to 1986 and the strong community support that comes with it, Postgres boasts of high reliability, extensibility, and data integrity.

Can we use PostgreSQL as data warehouse?

PostgreSQL, as the most advanced open source database, is so flexible that can serve as a simple relational database, a time-series data database, and even as an efficient, low-cost, data warehousing solution. You can also integrate it with several analytics tools.


1 Answers

As Arun pointed out in the comment you can just use setDT on dbGetQuery results.

Additionally there is a helper function available in my package dwtools which extends this feature for auto setkey when needed. This was designed to be useful when chaining. It also unifies the interface to other database vendors so you can chain data.table using different databases.
The simple select usage will looks like:

my_dt = db("SELECT * FROM eqtl")
# to setkey use
db("SELECT * FROM eqtl", key="mykeycol")

Heavily extended example from package manual:

jj_aggr = quote(list(amount=sum(amount), value=sum(value)))
r <- db("sales",key="geog_code" # read fact table from db
        )[,eval(jj_aggr),keyby=c("geog_code","time_code") # aggr by geog_code and time_code
          ][,db(.SD) # write to db, auto.table.name
            ][,db("geography",key="geog_code" # read lookup geography dim from db
                  )[.SD # left join geography
                    ][,eval(jj_aggr), keyby=c("time_code","geog_region_name")] # aggr
              ][,db(.SD) # write to db, auto.table.name
                ][,db("time",key="time_code" # read lookup time dim from db
                      )[.SD # left join time
                        ][, eval(jj_aggr), keyby=c("geog_region_name","time_month_code","time_month_name")] # aggr
                  ][,db(.SD) # write to db, auto.table.name
                    ]

It would read data from multiple databases, joins, aggregates, save intermediate results to multiple databases.

like image 187
jangorecki Avatar answered Oct 28 '22 17:10

jangorecki