Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I delete records from an ODBC database based on a dataframe in R

I have a dataframe in R that has a list of records. I would like to delete all of the matching records from my ODBC datasource.

    data_to_delete ##This is my dataframe

    delete from RODBC datasource where record_id IN (select record_id 
                                                    from data_to_delete)

Any ideas on how to achieve this? I am currently using sqlQuery to pull data down and sqlSave to create new records, but I'm not sure how to delete records.

like image 322
rascale Avatar asked Sep 30 '22 15:09

rascale


1 Answers

As the answers to this related question explain, you will need to build up the string that holds theDELETE statement programmatically, and then you can execute a DELETE statement just as any other statement within sqlQuery.

So at the end of your DELETE, don't try to "select" from the R variable, rather just dump its contents into a comma-separated string. Below, I assume that whatever data_to_delete is, we can use paste with as.character to spit it into a comma-separated string. But you can modify that as needed.

I also assume that you mean for datasource to refer to the database, and I use some_table_name as a placeholder for the table's name within that database. You should adjust the values for your specific case.

del_query <- sprintf("DELETE from some_table_name
                      where record_id IN (%s)",
                     paste(as.character(data_to_delete), 
                           collapse=", ")
             )

sqlQuery(datasource, del_query)

Using this method, you need to be careful if the content of your data is large. There is often a symbol limit for the number of distinct symbols that can appear in the IN clause -- something like 2000 symbols as default in SQL Server.

If you need more values than that, you could first issue a query that writes the values into a temp table wholly on the database, and then issue a query with sqlQuery that contains the extra IN ( SELECT * FROM ... ) but you'll be querying from that temp table, not "from" the R variable.

like image 157
ely Avatar answered Oct 02 '22 05:10

ely