Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bind variables in R DBI

Tags:

database

r

In R's DBI package, I'm not finding a facility for using bound variables. I did find a document (the original vignette from 2002) that says about bound variables, "Perhaps the DBI could at some point in the future implement this feature", but it looks like so far that's left undone.

What do people in R use for a substitute? Just concatenate strings right into the SQL? That's got some obvious problems for safety & performance.

EDIT:

Here's an example of how placeholders could work:

query <- "SELECT numlegs FROM animals WHERE color=?"
result <- dbGetQuery(caseinfo, query, bind="green")

That's not a very well-thought-out interface, but the idea is that you can use a value for bind and the driver handles the details of escaping (if the underlying API doesn't handle bound variables natively) without the caller having to reimplement it [badly].

like image 273
Ken Williams Avatar asked Feb 02 '10 17:02

Ken Williams


3 Answers

For anyone coming to this question like I just did after googling for rsqlite and dbgetpreparedquery, it seems that in the latest version of rsqlite you can run a SELECT query with bind variables. I just ran the following:

query <- "SELECT probe_type,next_base,color_channel FROM probes WHERE probeid=?"
probe.types.df <- dbGetPreparedQuery(con,que,bind.data=data.frame(probeids=ids))

This was relatively fast (selecting 2,000 rows out of a 450,000 row table) and is incredibly useful.

FYI.

like image 158
user1076 Avatar answered Nov 08 '22 18:11

user1076


Below is a summary of what's currently supported in RSQLite for bound parameters. You are right that there is not currently support for SELECT, but there is no good reason for this and I would like to add support for it.

If you feel like hacking, you can get a read-only checkout of all of the DBI related packages here:

use --user=readonly --password=readonly

https://hedgehog.fhcrc.org/compbio/r-dbi/trunk
https://hedgehog.fhcrc.org/compbio/r-dbi/trunk/DBI
https://hedgehog.fhcrc.org/compbio/r-dbi/trunk/SQLite/RSQLite

I like to receive patches, especially if they include tests and documentation. Unified diff, please. I actually do all my development using git and so best case is to create a git clone of say RSQLite and then send me diffs as git format-patch -n git-svn..

Anyhow, here are some examples:

library("RSQLite")

make_data <- function(n)
{
    alpha <- c(letters, as.character(0:9))
    make_key <- function(n)
    {
        paste(sample(alpha, n, replace = TRUE), collapse = "")
    }
    keys <- sapply(sample(1:5, replace=TRUE), function(x) make_key(x))
    counts <- sample(seq_len(1e4), n, replace = TRUE)
    data.frame(key = keys, count = counts, stringsAsFactors = FALSE)
}

key_counts <- make_data(100)


db <- dbConnect(SQLite(), dbname = ":memory:")

sql <- "
create table keys (key text, count integer)
"

dbGetQuery(db, sql)

bulk_insert <- function(sql, key_counts)
{
    dbBeginTransaction(db)
    dbGetPreparedQuery(db, sql, bind.data = key_counts)
    dbCommit(db)
    dbGetQuery(db, "select count(*) from keys")[[1]]
}

##  for all styles, you can have up to 999 parameters

## anonymous
sql <- "insert into keys values (?, ?)"
bulk_insert(sql, key_counts)


## named w/ :, $, @
## names are matched against column names of bind.data

sql <- "insert into keys values (:key, :count)"
bulk_insert(sql, key_counts[ , 2:1])

sql <- "insert into keys values ($key, $count)"
bulk_insert(sql, key_counts)

sql <- "insert into keys values (@key, @count)"
bulk_insert(sql, key_counts)

## indexed (NOT CURRENTLY SUPPORTED)
## sql <- "insert into keys values (?1, ?2)"
## bulk_insert(sql)
like image 34
seth Avatar answered Nov 08 '22 20:11

seth


Hey hey - I just discovered that RSQLite, which is what I'm using in this case, does indeed have bound-variable support:

http://cran.r-project.org/web/packages/RSQLite/NEWS

See the entry about dbSendPreparedQuery() and dbGetPreparedQuery().

So in theory, that turns this nastiness:

df <- data.frame()
for (x in data$guid) {
  query <- paste("SELECT uuid, cites, score FROM mytab WHERE uuid='",
                 x, "'", sep="")
  df <- rbind(df, dbGetQuery(con, query))
}

into this:

df <- dbGetPreparedQuery(
     con, "SELECT uuid, cites, score FROM mytab WHERE uuid=:guid", data)

Unfortunately, when I actually try it, it seems that it's only for INSERT statements and the like, not for SELECT statements, because I get an error: RS-DBI driver: (cannot have bound parameters on a SELECT statement).

Providing that capability would be fantastic.

The next step would be to hoist this up into DBI itself so that all DBs can take advantage of it, and provide a default implementation that just pastes it into the string like we're all doing ourselves now.

like image 1
Ken Williams Avatar answered Nov 08 '22 20:11

Ken Williams