Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Selecting a subset of a sqlite database based on a character vector

Tags:

sqlite

r

I am trying to extract part of my database using a character vector. To explain, I have used the mtcars data as below:

library(dplyr)
library(sqldf)
library(RSQLite)

df <- cbind(rownames(mtcars),mtcars)
colnames(df)[1] <- "CarName"
CarsToFind <- c("Valiant", "Merc 280", "Lotus Europa", "Volvo 142E")
db <- dbConnect(SQLite(), dbname = 'mtcars_db.sqlite3')
dbWriteTable(conn = db, name = 'mtcars_table', value = df, row.names = TRUE, header = TRUE)

I could find the section of the data frame that I am interested in using:

mini_df <- df[df$CarName %in% CarsToFind,]

but my real data is quite large and I would rather not extract the whole thing into a data frame. I am looking for something similar to :

sqldf("SELECT * FROM mtcars_table WHERE CarName IN CarsToFind")

but this gives me the error: "no such table: CarsToFind". I don't want to create the table 'CarsToFind' in the SQL databank, because I have many different queries that I want to perform on a once off basis. Is it possible to query the SQL using such a character vector?

like image 975
nm200 Avatar asked Feb 08 '17 16:02

nm200


People also ask

How do you subset a character vector in R?

The way you tell R that you want to select some particular elements (i.e., a 'subset') from a vector is by placing an 'index vector' in square brackets immediately following the name of the vector. For a simple example, try x[1:10] to view the first ten elements of x.

How do I subset a Dataframe in a vector in R?

If we have a vector and a data frame, and the data frame has a column that contains the values similar as in the vector then we can create a subset of the data frame based on that vector. This can be done with the help of single square brackets and %in% operator.

How do I SELECT a SQLite database?

First, establish a connection to the SQLite database by creating a Connection object. Next, create a Cursor object using the cursor method of the Connection object. Then, execute a SELECT statement. After that, call the fetchall() method of the cursor object to fetch the data.

What is subsetting in R?

Subsetting in R is a useful indexing feature for accessing object elements. It can be used to select and filter variables and observations. You can use brackets to select rows and columns from your dataframe.


2 Answers

The query you actually want to execute on SQLite should look like this:

SELECT *
FROM mtcars
WHERE CarName IN ('Valiant', 'Merc 280', 'Lotus Europa', 'Volvo 142E')

So all you need to do is build this string in R:

CarsToFind <- c("Valiant", "Merc 280", "Lotus Europa", "Volvo 142E")
CarsToFind <- paste0("'", CarsToFind, "'", collapse=", ")
whereIn <- paste0("(", CarsToFind, ")")

query <- paste0("SELECT * FROM mtcars WHERE CarName IN ", whereIn)             
sqldf(query)
like image 130
Tim Biegeleisen Avatar answered Oct 04 '22 15:10

Tim Biegeleisen


How about instead of creating a character vector, create a dataframe with only one column? This would work:

CarsToFind <- data.frame(lookup=c("Valiant", "Merc 280", "Lotus Europa", "Volvo 142E"))

sqldf("SELECT * FROM df WHERE CarName IN CarsToFind")

Also, this way you don't have to change/add anything on the SQL side, you can still keep everything on the R side.

like image 42
Mike H. Avatar answered Oct 04 '22 15:10

Mike H.