Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I run an SQL update statement using only dplyr syntax in R

I need to update column values conditionnaly on other columns in some PostgreSQL database table. I managed to do it writing an SQL statement in R and executing it with dbExecute from DBI package.

library(dplyr)
library(DBI)

# Establish connection with database
con <- dbConnect(RPostgreSQL::PostgreSQL(), dbname = "myDb",
                 host="localhost", port= 5432, user="me",password = myPwd)

# Write SQL update statement
request <- paste("UPDATE table_to_update",
                 "SET var_to_change = 'new value' ",
                 "WHERE filter_var = 'filter' ")

# Back-end execution
con %>% dbExecute(request)

Is it possible to do so using only dplyr syntax ? I tried, out of curiosity,

con %>% tbl("table_to_update") %>%
   mutate(var_to_change = if (filter_var == 'filter') 'new value' else var_to_change)

which works in R but obviously does nothing in db since it uses a select statement. copy_to allows only for append and overwite options, so I can't see how to use it unless deleting then appending the filtered observations...

like image 569
Romain Avatar asked Jul 17 '17 16:07

Romain


People also ask

Does Dplyr use SQL?

There are two components to dplyr's SQL translation system: translation of vector expressions like x * y + 10. translation of whole verbs like mutate() or summarise()

How do I run an SQL script in R?

Did you know that you can run SQL code in an R Notebook code chunk? To use SQL, open an R Notebook in the RStudio IDE under the File > New File menu. Start a new code chunk with {sql} , and specify your connection with the connection=con code chunk option.

Is Dplyr like SQL?

dplyr is a R package that provides a set of grammar based functions to transform data. Compared to using SQL, it's much easier to construct and much easier to read what's constructed.

What is the syntax of updating existing record in the database?

The Syntax for SQL UPDATE CommandWHERE [condition]; The UPDATE statement lets the database system know that you wish to update the records for the table specified in the table_name parameter. The columns that you want to modify are listed after the SET statement and are equated to their new updated values.


1 Answers

Current dplyr 0.7.1 (with dbplyr 1.1.0) doesn't support this, because it assumes that all data sources are immutable. Issuing an UPDATE via dbExecute() seems to be the best bet.

For replacing a larger chunk in a table, you could also:

  1. Write the data frame to a temporary table in the database via copy_to().
  2. Start a transaction.
  3. Issue a DELETE FROM ... WHERE id IN (SELECT id FROM <temporary table>)
  4. Issue an INSERT INTO ... SELECT * FROM <temporary table>
  5. Commit the transaction

Depending on your schema, you might be able to do a single INSERT INTO ... ON CONFLICT DO UPDATE instead of DELETE and then INSERT.

like image 57
krlmlr Avatar answered Oct 23 '22 23:10

krlmlr