I'm doing some analysis on a large volume of data stored in a PostgreSQL database. For speed and memory reasons I'm using the data.table
package. Currently I'm doing this to read the data.
library(RPostgreSQL)
library(data.table)
...
query <- "SELECT * FROM eqtl"
data <- as.data.table(dbGetQuery(con, query))
I'm wondering if there is a better way to do this that doesn't involve reading the whole thing into a data.frame
and then copying it into a data.table
.
Use the \dt or \dt+ command in psql to show tables in a specific database. Use the SELECT statement to query table information from the pg_catalog.
If you want to select data from all the columns of the table, you can use an asterisk ( * ) shorthand instead of specifying all the column names. The select list may also contain expressions or literal values. Second, specify the name of the table from which you want to query data after the FROM keyword.
Introduction to PostgreSQL Its versatility makes it possible to be used as a transactional database as well as a data warehouse for analytics. With its origins dating back to 1986 and the strong community support that comes with it, Postgres boasts of high reliability, extensibility, and data integrity.
PostgreSQL, as the most advanced open source database, is so flexible that can serve as a simple relational database, a time-series data database, and even as an efficient, low-cost, data warehousing solution. You can also integrate it with several analytics tools.
As Arun pointed out in the comment you can just use setDT
on dbGetQuery
results.
Additionally there is a helper function available in my package dwtools which extends this feature for auto setkey
when needed. This was designed to be useful when chaining. It also unifies the interface to other database vendors so you can chain data.table using different databases.
The simple select usage will looks like:
my_dt = db("SELECT * FROM eqtl")
# to setkey use
db("SELECT * FROM eqtl", key="mykeycol")
Heavily extended example from package manual:
jj_aggr = quote(list(amount=sum(amount), value=sum(value)))
r <- db("sales",key="geog_code" # read fact table from db
)[,eval(jj_aggr),keyby=c("geog_code","time_code") # aggr by geog_code and time_code
][,db(.SD) # write to db, auto.table.name
][,db("geography",key="geog_code" # read lookup geography dim from db
)[.SD # left join geography
][,eval(jj_aggr), keyby=c("time_code","geog_region_name")] # aggr
][,db(.SD) # write to db, auto.table.name
][,db("time",key="time_code" # read lookup time dim from db
)[.SD # left join time
][, eval(jj_aggr), keyby=c("geog_region_name","time_month_code","time_month_name")] # aggr
][,db(.SD) # write to db, auto.table.name
]
It would read data from multiple databases, joins, aggregates, save intermediate results to multiple databases.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With