Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write a table in PostgreSQL from R?

Tags:

r

postgresql

At present to insert data in a PostgreSQL table I have to create an empty table and then do an insert into table values ... along with a dataframe collapsed insto a single string with all the values. It doesn't work for large sized dataframes.

The dbWtriteTable() doesn't work for PostgreSQL and gives the following error...

Error in postgresqlpqExec(new.con, sql4) : RS-DBI driver: (could not Retrieve the result : ERROR: syntax error at or near "STDIN" LINE 1: COPY "table_1" FROM STDIN

I have tried the following hack as suggested in answer to a similar question asked before. Here's the link... How do I write data from R to PostgreSQL tables with an autoincrementing primary key?

body_lines <- deparse(body(RPostgreSQL::postgresqlWriteTable))
new_body_lines <- sub(
  'postgresqlTableRef(name), "FROM STDIN")', 
  'postgresqlTableRef(name), "(", paste(shQuote(names(value)), collapse = ","), ") FROM STDIN")', 
  body_lines,
  fixed = TRUE
)
fn <- RPostgreSQL::postgresqlWriteTable
body(fn) <- parse(text = new_body_lines)
while("RPostgreSQL" %in% search()) detach("package:RPostgreSQL")
assignInNamespace("postgresqlWriteTable", fn, "RPostgreSQL")

This hack still doesn't work for me. The postgresqlWriteTable() throws exactly the same error... What exactly is the problem here?

As an alternative I have tried using dbWriteTable2() from caroline package. And it throws a different error...

Error in postgresqlExecStatement(conn, statement, ...) : 
  RS-DBI driver: (could not Retrieve the result : ERROR:  column "id" does not exist in table_1
)
creating NAs/NULLs for for fields of table that are missing in your df
Error in postgresqlExecStatement(conn, statement, ...) : 
  RS-DBI driver: (could not Retrieve the result : ERROR:  column "id" does not exist in table_1
)

Is there any other method to write a large dataframe into a table in PostgreSQL directly?

like image 606
Gaurav Avatar asked Jun 06 '16 07:06

Gaurav


People also ask

How do I create a table in PostgreSQL?

CREATE TABLE table_name( column1 datatype, column2 datatype, column3 datatype, ..... columnN datatype, PRIMARY KEY( one or more columns ) ); CREATE TABLE is a keyword, telling the database system to create a new table. The unique name or identifier for the table follows the CREATE TABLE statement.

Can R connect to PostgreSQL?

You can not only connect your R with PostgreSQL but also with other databases such as SQLite, SQL Server, and Data Warehouses such as Google BigQuery, and Snowflake. You can refer to Snowflake R/RStudio Integration to learn more.

How do I create a table and column in PostgreSQL?

Creating a table using the PostgreSQL CREATE TABLE statement. The PostgreSQL CREATE TABLE statement basic syntax is as follows: CREATE TABLE [IF NOT EXISTS] table_name ( column1 datatype(length) column_contraint, column2 datatype(length) column_contraint, column3 datatype(length) column_contraint, table_constraints );

How do you create a table in pgAdmin?

Now reach "tables" in pgAdmin III window, right click on "tables" and click on "New Table". This will open a new window to create a New Table. Supply a name of your new table and then click on Columns.


2 Answers

Ok, I'm not sure why dbWriteTable() would be failing; there may be some kind of version/protocol mismatch. Perhaps you could try installing the latest versions of R, the RPostgreSQL package, and upgrading the PostgreSQL server on your system, if possible.

Regarding the insert into workaround failing for large data, what is often done in the IT world when large amounts of data must be moved and a one-shot transfer is infeasible/impractical/flaky is what is sometimes referred to as batching or batch processing. Basically, you divide the data into smaller chunks and send each chunk one at a time.

As a random example, a few years ago I wrote some Java code to query for employee information from an HR LDAP server which was constrained to only provide 1000 records at a time. So basically I had to write a loop to keep sending the same request (with the query state tracked using some kind of weird cookie-based mechanism) and accumulating the records into a local database until the server reported the query complete.

Here's some code that manually constructs the SQL to create an empty table based on a given data.frame, and then insert the content of the data.frame into the table using a parameterized batch size. It's mostly built around calls to paste() to build the SQL strings, and dbSendQuery() to send the actual queries. I also use postgresqlDataType() for the table creation.

## connect to the DB
library('RPostgreSQL'); ## loads DBI automatically
drv <- dbDriver('PostgreSQL');
con <- dbConnect(drv,host=...,port=...,dbname=...,user=...,password=...);

## define helper functions
createEmptyTable <- function(con,tn,df) {
    sql <- paste0("create table \"",tn,"\" (",paste0(collapse=',','"',names(df),'" ',sapply(df[0,],postgresqlDataType)),");");
    dbSendQuery(con,sql);
    invisible();
};

insertBatch <- function(con,tn,df,size=100L) {
    if (nrow(df)==0L) return(invisible());
    cnt <- (nrow(df)-1L)%/%size+1L;
    for (i in seq(0L,len=cnt)) {
        sql <- paste0("insert into \"",tn,"\" values (",do.call(paste,c(sep=',',collapse='),(',lapply(df[seq(i*size+1L,min(nrow(df),(i+1L)*size)),],shQuote))),");");
        dbSendQuery(con,sql);
    };
    invisible();
};

## generate test data
NC <- 1e2L; NR <- 1e3L; df <- as.data.frame(replicate(NC,runif(NR)));

## run it
tn <- 't1';
dbRemoveTable(con,tn);
createEmptyTable(con,tn,df);
insertBatch(con,tn,df);
res <- dbReadTable(con,tn);
all.equal(df,res);
## [1] TRUE

Note that I didn't bother prepending a row.names column to the database table, unlike dbWriteTable(), which always seems to include such a column (and doesn't seem to provide any means of preventing it).

like image 190
bgoldst Avatar answered Oct 18 '22 15:10

bgoldst


I had the same error while working through this example.

For me worked:

dbWriteTable(con, "cartable", value = df, overwrite = T, append = F, row.names = FALSE)

While I have configured a table "cartable" in pgAdmin. So an empty table existed and I had to overwrite that table with values.

like image 24
heiko Avatar answered Oct 18 '22 15:10

heiko