I have a dataframe in R which contains the output of previous queries. Unfortunately, I cannot do this directly in SQL since it is too slow so I am using the data.table package. The output from the data.table package is a data frame of 50,000 ids. I need to pull all records from the database for each id.
# x is a dataframe containing 50,000 ids.
Usually, I would do something like,
dbGetQuery(con, "Select * from data where id in x")
but that won't work. An alternative is to do 50,000 queries in a for loop, but I am thinking that there must be a more efficient method to do this.
What is the most efficient way to do this?
For example,
x <- 0:3
> q <- "select * from table where id in (%s)"
> sprintf(q,paste(x,collapse = ","))
[1] "select * from table where id in (0,1,2,3)"
As I mentioned in my comment, some databases have limits on the number of items you can put in the IN
clause. I'm not familiar enough with MySQL to know what that is, but I'd be willing to bet it's large enough that you could do this in only a handful of queries.
And in many cases this will be less efficient (slower) than having the IDs in a table in the database and doing a join, but sometimes people don't have the access to the database required to accomplish that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With