I store my data in Postgresql server. I want to load a table which has 15mil rows to data.frame
or data.table
I use RPostgreSQL
to load data.
library(RPostgreSQL)
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, ...)
# Select data from a table
system.time(
df <- dbGetQuery(con, "SELECT * FROM 15mil_rows_table")
)
It took 20 minutes to load data from DB to df. I use google cloud server which have 60GB ram and 16 Core CPU
What should I do to reduce load time?
Not sure if this will reduce load time, for sure it may reduce load time as both processes are quite performance efficient. You can leave a comment about the timming.
psql
as dump table to csv:COPY 15mil_rows_table TO '/path/15mil_rows_table.csv' DELIMITER ',' CSV HEADER;
library(data.table)
DT <- fread("/path/15mil_rows_table.csv")
I use the method as @Jan Gorecki with zip data to save memory.
1- Dump table to csv
psql -h localhost -U user -d 'database' -c "COPY 15mil_rows_table TO stdout DELIMITER ',' CSV HEADER" | gzip > 15mil_rows_table.csv.gz &
2- Load data in R
DT <- fread('zcat 15mil_rows_table.csv.gz')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With