Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R SQL: Pull data from MySQL for list of ids already in a dataframe

Tags:

sql

mysql

r

r-dbi

I have a dataframe in R which contains the output of previous queries. Unfortunately, I cannot do this directly in SQL since it is too slow so I am using the data.table package. The output from the data.table package is a data frame of 50,000 ids. I need to pull all records from the database for each id.

# x is a dataframe containing 50,000 ids. 

Usually, I would do something like,

dbGetQuery(con, "Select * from data where id in x") 

but that won't work. An alternative is to do 50,000 queries in a for loop, but I am thinking that there must be a more efficient method to do this.

What is the most efficient way to do this?

like image 523
quant actuary Avatar asked Feb 08 '23 16:02

quant actuary


1 Answers

For example,

x <- 0:3
> q <- "select * from table where id in (%s)"
> sprintf(q,paste(x,collapse = ","))
[1] "select * from table where id in (0,1,2,3)"

As I mentioned in my comment, some databases have limits on the number of items you can put in the IN clause. I'm not familiar enough with MySQL to know what that is, but I'd be willing to bet it's large enough that you could do this in only a handful of queries.

And in many cases this will be less efficient (slower) than having the IDs in a table in the database and doing a join, but sometimes people don't have the access to the database required to accomplish that.

like image 117
joran Avatar answered Feb 11 '23 07:02

joran