Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bracket-escaped table names with dplyr

Tags:

I'm programmatically fetching a bunch of datasets, many of them having silly names that begin with numbers and have special characters like minus signs in them. Because none of the datasets are particularly large, and I wanted the benefit R making its best guess about data types, I'm (ab)using dplyr to dump these tables into SQLite.

I am using square brackets to escape the horrible table names, but this doesn't seem to work. For example:

data(iris)
foo.db <- src_sqlite("foo.sqlite3", create = TRUE)
copy_to(foo.db, df=iris, name="[14m3-n4m3]")

This results in the error message:

Error in sqliteSendQuery(conn, statement, bind.data) : error in statement: no such table: 14m3-n4m3

This works if I choose a sensible name. However, due to a variety of reasons, I'd really like to keep the cumbersome names. I am also able to create such a badly-named table directly from sqlite:

sqlite> create table [14m3-n4m3](foo,bar,baz);
sqlite> .tables
14m3-n4m3

Without cracking into things too deeply, this looks like dplyr is handling the square brackets in some way that I cannot figure out. My suspicion is that this is a bug, but I wanted to check here first to make sure I wasn't missing something.

EDIT: I forgot to mention the case where I just pass the janky name directly to dplyr. This errors out as follows:

library(dplyr)

data(iris)
foo.db <- src_sqlite("foo.sqlite3", create = TRUE)
copy_to(foo.db, df=iris, name="14M3-N4M3")

Error in sqliteSendQuery(conn, statement, bind.data) : 
  error in statement: unrecognized token: "14M3"
like image 517
Peter Avatar asked Jan 25 '15 02:01

Peter


1 Answers

This is a bug in dplyr. It's still there in the current github master. As @hadley indicates, he has tried to escape things like table names in dplyr to prevent this issue. The current problem you're having arises from lack of escaping in two functions. Table creation works fine when providing the table name unescaped (and is done with dplyr::db_create_table). However, the insertion of data to the table is done using DBI::dbWriteTable which doesn't support odd table names. If the table name is provided to this function escaped, it fails to find it in the list of tables (the first error you report). If it is provided escaped, then the SQL to do the insertion is not synatactically valid.

The second issue comes when the table is updated. The code to get the field names, this time actually in dplyr, again fails to escape the table name because it uses paste0 rather than build_sql.

I've fixed both errors at a fork of dplyr. I've also put in a pull request to @hadley and made a note on the issue https://github.com/hadley/dplyr/issues/926. In the meantime, if you wanted to you could use devtools::install_github("NikNakk/dplyr", ref = "sqlite-escape") and then revert to the master version once it's been fixed.

Incidentally, the correct SQL-99 way to escape table names (and other identifiers) in SQL is with double quotes (see SQL standard to escape column names?). MS Access uses square brackets, while MySQL defaults to backticks. dplyr uses double quotes, per the standard.

Finally, the proposal from @RichardScriven wouldn't work universally. For example, select is a perfectly valid name in R, but is not a syntactically valid table name in SQL. The same would be true for other reserved words.

like image 118
Nick Kennedy Avatar answered Oct 02 '22 23:10

Nick Kennedy